logo of company

Longitudinal COVID Risk/Reward Behaviors & Mental Health Modeling


This project aims to examine longitudinal changes in risk- and reward-related behaviors and decision-making among a general community sample over the first year following the initial COVID-19 lock-down in the United States, which began in March 2020. The focus is to understand how these behaviors evolved during the pandemic and to identify the factors influencing these trajectories. This study was conducted by the Computational Psychopathology Laboratory at McLean Hospital & Harvard Medical School.

Author: Adrian Medina

Date: October 4, 2024

Overview

Study Design:
The study spans from May 2020 to February 2021, encompassing seven sessions including the baseline. Follow-up assessments were conducted at two weeks, one month, six weeks, two months, ten weeks, and three months post-baseline. This design allows for detailed tracking of changes over a crucial period of adjustment to pandemic-related disruptions.

Data Collection:
Data were collected on a variety of domains, including:

  • Risky Behaviors: As characterized by the Centers for Disease Control and Prevention (CDC).
  • Risk/Reward Appraisals: Assessment of individuals’ perceptions and evaluations of risky situations versus potential gains
  • Clinical Measures: Focused primarily on mental health, with a few variables related to physical health.
  • Coping & Support Measures: Insights into how individuals cope with stress and their available support systems.
  • Psychometric Measures: Standardized tests to measure psychological variables, offering a window into the psychological state of participants.

Analytic Approach:
The primary analytic strategy involves using multilevel modeling to:

  • Determine the best-fitting models for describing the trajectories of risk/reward behavior variables.
  • Incorporate clinical, psychometric, coping, and support measures to enrich our understanding of the factors influencing risk and reward inclinations or aversions during this unprecedented period.

Goals:
The primary analytic strategy involves using multilevel modeling to:

  • Identify key patterns and changes in risk-taking and reward-seeking behaviors across the studied time frame.
  • Explore the impact of psychological and social factors on these behaviors to better understand the mental health implications of the pandemic.
  • Develop predictive insights that can inform future public health strategies and interventions in similar crises.

Data Frame Initialization

Set up the R environment by configuring the CRAN repository, installing essential packages, and setting base paths.

# Set CRAN repository for consistent package installation
options(repos = c(CRAN = "http://cran.rstudio.com/"))

if (!require(pacman)) install.packages("pacman")
library(pacman)

# Use p_load function to install (if necessary) and load packages
p_load(psych, ggplot2, lme4, Matrix, lmerTest, nlme, dplyr, performance, interactions, sampling, tidyr,
       tidyverse, kableExtra, broom.mixed, gridExtra, sjPlot, ggridges, PupillometryR, 
       patchwork)

# Define base path for streamlined access/replication
base_path <- "~/GitHub/COVID-MultilevelModeling/data_files"

# Set working directory
setwd(base_path)

# Unzip the file in the local directory
unzip("covid_long.csv.zip")

# Read in risk/reward dataset, presently in long format
covid_long <- read.csv("covid_long.csv")

# Read in longitudinal mental health questionnaire data, presently in wide format
mentalhealth_survey_data <- read.csv("FINAL_SURVEY_DATA_2024-06-12 .csv")
1
this package provides efficient package management if using multiple packages within the same file!

The source dataset is loaded and time values are adjusted from 1-7 to 0-6. Initialize unified columns for behavior, risk, and reward variables.

Code
# Recode time from values of 1-7 to 0-6
covid_long$time <- covid_long$Index1 - 1

# Define the behavior variable names
beh_variable_names <- c("L1_mail", "HM26_plane", "L2_takeout", "L4_tennis", "LM7_walk_others", 
                    "LM8_golf", "MOD16_bbq", "HM24_eat_rest_in", "LM9_stay_hotel",
                    "LM11_library", "LM12_eat_rest_out", "LM14_playground", "H33_going_movies",
                    "L5_camping", "MOD17_beach", "MOD18_mall", "LM6_grocery", 
                    "MOD20_work_office", "HM28_play_football", "MOD21_pool", "H37_bar", 
                    "MOD22_visit_relatives", "LM13_walk_downtown", "HM23_hair_salon", 
                    "HM25_wedding", "L3_pump_gas", "HM27_play_basketball", 
                    "MOD15_dinner_friend", "H36_attend_religious", "HM29_shake_hands", 
                    "H30_buffet", "H31_gym", "H32_amusement_park", "LM10_doc_office", 
                    "H34_concert", "H35_going_sports", "MOD19_kids_school")

# Create a new '_unified' column (to organize behavior data in non-iterative manner) for each behavior variable
for (var_name in beh_variable_names) {
  covid_long[paste0(var_name, "_unified")] <- NA
}

# Define the base risk variable names
risk_variable_names <- c("L1Y_mail_risk", "HM26Y_plane_risk", "L2Y_takeout_risk", 
                         "L4Y_tennis_risk", "LM7Y_walk_others_risk", "LM8Y_golf_risk", 
                         "MOD16Y_bbq_risk", "HM24Y_eat_rest_in_risk", "LM9Y_stay_hotel_risk", 
                         "LM11Y_library_risk", "LM12Y_eat_rest_out_risk", 
                         "LM14Y_playground_risk", "H33Y_going_movies_risk", "L5Y_camping_risk",
                         "MOD17Y_beach_risk", "MOD18Y_mall_risk", "LM6Y_grocery_risk", 
                         "MOD20Y_work_office_risk", "HM28Y_play_football_risk", 
                         "MOD21Y_pool_risk", "H37Y_bar_risk", "MOD22Y_visit_relatives_risk", 
                         "LM13Y_walk_downtown_risk", "HM23Y_hair_salon_risk", 
                         "HM25Y_wedding_risk", "L3Y_pump_gas_risk", 
                         "HM27Y_play_basketball_risk", "MOD15Y_dinner_friend_risk", 
                         "H36Y_attend_religious_risk", "HM29Y_shake_hands_risk", 
                         "H30Y_buffet_risk", "H31Y_gym_risk", "H32Y_amusement_park_risk", 
                         "LM10Y_doc_office_risk", "H34Y_concert_risk", "H35Y_going_sports_risk"
                         , "MOD19Y_kids_school_risk", "L1N_mail_risk", "HM26N_plane_risk", 
                         "L2N_takeout_risk", "L4N_tennis_risk", "LM7N_walk_others_risk", 
                         "LM8N_golf_risk", "MOD16N_bbq_risk", "HM24N_eat_rest_in_risk", 
                         "LM9N_stay_hotel_risk", "LM11N_library_risk", 
                         "LM12N_eat_rest_out_risk", "LM14N_playground_risk", 
                         "H33N_going_movies_risk", "L5N_camping_risk", "MOD17N_beach_risk", 
                         "MOD18N_mall_risk", "LM6N_grocery_risk", "MOD20N_work_office_risk", 
                         "HM28N_play_football_risk", "MOD21N_pool_risk", "H37N_bar_risk", 
                         "MOD22N_visit_relatives_risk", "LM13N_walk_downtown_risk", 
                         "HM23N_hair_salon_risk", "HM25N_wedding_risk", "L3N_pump_gas_risk", 
                         "HM27N_play_basketball_risk", "MOD15N_dinner_friend_risk", 
                         "H36N_attend_religious_risk", "HM29N_shake_hands_risk", 
                         "H30N_buffet_risk", "H31N_gym_risk", "H32N_amusement_park_risk", 
                         "LM10N_doc_office_risk", "H34N_concert_risk", "H35N_going_sports_risk"
                         , "MOD19N_kids_school_risk")

# Create a new '_unified' column (to organize risk data in non-iterative manner) for each behavior variable
for (var_name in risk_variable_names) {
  covid_long[paste0(var_name, "_unified")] <- NA
}

# Define the base reward variable names
rew_variable_names <- c("L1Y_mail_rew", "HM26Y_plane_rew", "L2Y_takeout_rew", "L4Y_tennis_rew",
                        "LM7Y_walk_others_rew", "LM8Y_golf_rew", "MOD16Y_bbq_rew", 
                        "HM24Y_eat_rest_in_rew",  "LM9Y_stay_hotel_rew", "LM11Y_library_rew", 
                        "LM12Y_eat_rest_out_rew", "LM14Y_playground_rew", 
                        "H33Y_going_movies_rew", "L5Y_camping_rew", "MOD17Y_beach_rew", 
                        "MOD18Y_mall_rew", "LM6Y_grocery_rew", "MOD20Y_work_office_rew", 
                        "HM28Y_play_football_rew", "MOD21Y_pool_rew", "H37Y_bar_rew", 
                        "MOD22Y_visit_relatives_rew", "LM13Y_walk_downtown_rew", 
                        "HM23Y_hair_salon_rew", "HM25Y_wedding_rew", "L3Y_pump_gas_rew", 
                        "HM27Y_play_basketball_rew", "MOD15Y_dinner_friend_rew", 
                        "H36Y_attend_religious_rew", "HM29Y_shake_hands_rew", "H30Y_buffet_rew"
                        , "H31Y_gym_rew", "H32Y_amusement_park_rew", "LM10Y_doc_office_rew", 
                        "H34Y_concert_rew", "H35Y_going_sports_rew", "MOD19Y_kids_school_rew", 
                        "L1N_mail_rew", "HM26N_plane_rew", "L2N_takeout_rew", "L4N_tennis_rew",
                        "LM7N_walk_others_rew", "LM8N_golf_rew", "MOD16N_bbq_rew", 
                        "HM24N_eat_rest_in_rew", "LM9N_stay_hotel_rew", "LM11N_library_rew", 
                        "LM12N_eat_rest_out_rew", "LM14N_playground_rew", 
                        "H33N_going_movies_rew", "L5N_camping_rew", "MOD17N_beach_rew", 
                        "MOD18N_mall_rew", "LM6N_grocery_rew", "MOD20N_work_office_rew", 
                        "HM28N_play_football_rew", "MOD21N_pool_rew", "H37N_bar_rew", 
                        "MOD22N_visit_relatives_rew", "LM13N_walk_downtown_rew", 
                        "HM23N_hair_salon_rew", "HM25N_wedding_rew", "L3N_pump_gas_rew", 
                        "HM27N_play_basketball_rew", "MOD15N_dinner_friend_rew", 
                        "H36N_attend_religious_rew", "HM29N_shake_hands_rew", "H30N_buffet_rew"
                        , "H31N_gym_rew", "H32N_amusement_park_rew", "LM10N_doc_office_rew", 
                        "H34N_concert_rew", "H35N_going_sports_rew", "MOD19N_kids_school_rew")

# Create a new '_unified' column (to organize reward data in non-iterative manner) for each behavior variable
for (var_name in rew_variable_names) {
  covid_long[paste0(var_name, "_unified")] <- NA
}

The code below is meant to search through the source dataset ‘covid_long’ to determine whether or not each of the behavior, risk, and reward variables contain their longitudinal counterparts (based on the consistent naming scheme already included in the source dataset). This was done to ensure all variables were named correctly, thus ensuring they be included in modeling. The output for each check should be ‘character(0)’.

Code
# Define the naming scheme for follow-up sessions
follow_up_suffixes <- c("", "_two_week", "_monthly", "_six_week", "_sec_monthly", "_ten_week", "_third_monthly")

# Create a list of all expected columns based on the naming scheme
beh_expected_columns <- unlist(lapply(beh_variable_names, function(var) {
  paste0(var, follow_up_suffixes)
}))

# Check which expected behavior columns are missing in the data frame
beh_missing_columns <- setdiff(beh_expected_columns, names(covid_long))

# Output the missing behavior columns (if any)
print(beh_missing_columns)
character(0)
Code
# Create a list of all expected risk columns based on the naming scheme
risk_expected_columns <- unlist(lapply(risk_variable_names, function(var) {
  paste0(var, follow_up_suffixes)
}))

# Check which expected risk columns are missing in the data frame
risk_missing_columns <- setdiff(risk_expected_columns, names(covid_long))

# Output the missing risk columns (if any)
print(risk_missing_columns)
character(0)
Code
# Create a list of all expected reward columns based on the naming scheme
rew_expected_columns <- unlist(lapply(rew_variable_names, function(var) {
  paste0(var, follow_up_suffixes)
}))

# Check which expected reward columns are missing in the data frame
rew_missing_columns <- setdiff(rew_expected_columns, names(covid_long))

# Output the missing reward columns (if any)
print(rew_missing_columns)
character(0)

Filter out subjects with completely missing data across all unified variables, recode ‘NA’ values to ‘2’ in behavior variables, and populate unified columns according to specific time points, ensuring each column accurately reflects the data for each session.

Code
# Generate unified variable names
beh_unified_vars <- paste0(beh_variable_names, "_unified")
risk_unified_vars <- paste0(risk_variable_names, "_unified")
rew_unified_vars <- paste0(rew_variable_names, "_unified")

# Combine all unified variable names
all_unified_vars <- c(beh_unified_vars, risk_unified_vars, rew_unified_vars)

# Combine all expected columns
all_expected_columns <- c(beh_expected_columns, risk_expected_columns, rew_expected_columns)

# Create the subset dataframe
covid_long_analytic <- covid_long %>%
  select(record_id, time, baseline_date_complete, all_of(c(all_expected_columns, all_unified_vars)))

# Create complete expected columns per variable type
beh_all_columns <- c(beh_expected_columns, beh_unified_vars)
risk_all_columns <- c(risk_expected_columns, risk_unified_vars)
rew_all_columns <- c(rew_expected_columns, rew_unified_vars)
all_columns <- c(all_expected_columns, all_unified_vars)

# Filter out rows where all entries in 'all_columns' are NA
covid_long_filtered <- covid_long_analytic %>%
  filter(rowSums(!is.na(select(., all_of(all_columns)))) > 0)  # Tested using different filter variables, all of which yielded the same amount of subjects + iterations, so kept the most comprehensive option

# Count NA values in each column of the filtered data frame
na_counts <- colSums(is.na(covid_long_filtered))

# Recode NA values to 2 in specified behavior columns using explicit dplyr namespace
covid_long_filtered <- dplyr::mutate(
  covid_long_filtered,
  dplyr::across(dplyr::all_of(beh_expected_columns), ~ dplyr::if_else(is.na(.), 2, .))
)

# Double-check if there are still NA values in these columns
sum_na <- invisible(sapply(covid_long_filtered[beh_expected_columns], function(x) sum(is.na(x))))
print(sum_na)  # This will show the count of NA values per column after the replacement
# Explicitly populate the unified behavior columns based on 'time' value
for (var in beh_variable_names) {
  for (i in 0:6) {
    covid_long_filtered <- dplyr::mutate(
      covid_long_filtered,
      !!paste0(var, "_unified") := dplyr::if_else(
        time == i,
        covid_long_filtered[[paste0(var, follow_up_suffixes[i + 1])]],
        covid_long_filtered[[paste0(var, "_unified")]],
        missing = NA
      )
    )
  }
}

# Explicitly populate the unified risk columns based on 'time' value
for (var in risk_variable_names) {
  for (i in 0:6) {
    covid_long_filtered <- dplyr::mutate(
      covid_long_filtered,
      !!paste0(var, "_unified") := dplyr::if_else(
        time == i, 
        covid_long_filtered[[paste0(var, follow_up_suffixes[i + 1])]],
        covid_long_filtered[[paste0(var, "_unified")]],
        missing = NA
      )
    )
  }
}

# Explicitly populate the unified reward columns based on 'time' value
for (var in rew_variable_names) {
  for (i in 0:6) {
    covid_long_filtered <- dplyr::mutate(
      covid_long_filtered,
      !!paste0(var, "_unified") := dplyr::if_else(
        time == i,
        covid_long_filtered[[paste0(var, follow_up_suffixes[i + 1])]],
        covid_long_filtered[[paste0(var, "_unified")]],
        missing = NA
      )
    )
  }
}

Behavior Data: Wrangling & Transformations

Analyzes behavioral responses by filtering subjects with uniform responses and calculating counts of affirmative and negative responses. It summarizes these counts to derive total response metrics for each subject and time point, highlighting behavioral patterns.

Code
# Create the behavior subset data frame
covid_behavior_filtered <- covid_long_filtered %>%
  select(record_id, time, baseline_date_complete, all_of(c(beh_unified_vars)))

# Filter out empty rows of subject data (e.g., '2' in all behavioral variables)
covid_behavior_filtered <- covid_behavior_filtered %>%
  # Add a helper column that checks if all values in specified columns are '2'
  mutate(all_twos = rowSums(across(all_of(beh_unified_vars)) == 2) == length(beh_unified_vars)) %>%
  # Filter out rows where the helper column is TRUE
  filter(!all_twos) %>%
  # Optionally remove the helper column if it's no longer needed
  select(-all_twos)

# Count only '0' values (yes) for each row across selected variables, excluding 'record_id' and 'time'
covid_behavior_filtered$yes_counts <- rowSums(covid_behavior_filtered[, -c(1, 2)] == 0, na.rm = TRUE)

# Count only '1' values (no) for each row across selected variables, excluding 'record_id', 'time', and 'yes_counts'
covid_behavior_filtered$no_counts <- rowSums(covid_behavior_filtered[, -c(1, 2, 40)] == 1, na.rm = TRUE)

# Sum the 'yes_counts' and 'no_counts' to get the total response counts
covid_behavior_filtered$response_counts <- covid_behavior_filtered$yes_counts + covid_behavior_filtered$no_counts

Risk Data: Wrangling & Transformations

Processes risk-related data by excluding subjects with entirely missing risk variables. It calculates non-missing risk sums and counts to compute average risk scores, both total and adjusted for actual data availability, to assess risk profiles longitudinally.

Code
# Create the risk subset data frame
covid_risk_filtered <- covid_long_filtered %>%
  select(record_id, time, baseline_date_complete, all_of(c(risk_unified_vars)))

# Filter out empty rows of subject data (e.g., 'NA' in all risk variables)
covid_risk_filtered <- covid_risk_filtered %>%
  # Add a helper column that checks if all values in specified columns are NA
  mutate(all_na = rowSums(is.na(across(all_of(risk_unified_vars)))) == length(risk_unified_vars)) %>%
  # Filter out rows where the helper column is TRUE
  filter(!all_na) %>%
  # Optionally remove the helper column if it's no longer needed
  select(-all_na)

# Create 'sum_risks' and 'num_risks' columns by summing all risk-related columns for each row, ignoring NA values
covid_risk_filtered$sum_risks <- rowSums(covid_risk_filtered[risk_unified_vars], na.rm = TRUE)
covid_risk_filtered$num_risks <- rowSums(!is.na(covid_risk_filtered[risk_unified_vars]))

# Create 'avg_risk' column for risk variables by dividing 'sum_risks' by 'num_risks'
covid_risk_filtered$avg_risk <- covid_risk_filtered$sum_risks / covid_risk_filtered$num_risks

Reward Data: Wrangling & Transformations

Processes reward-related data by isolating subjects with complete data and computing sums and counts of non-missing reward entries. It derives average reward scores, both overall and adjusted based on data presence, to analyze reward dynamics across time points. The section concludes by merging the refined reward data with previously processed behavioral and risk data to form a comprehensive dataset for further analysis.

Code
# Create the reward subset data frame
covid_rew_filtered <- covid_long_filtered %>%
  select(record_id, time, baseline_date_complete, all_of(c(rew_unified_vars)))

# Filter out empty rows of subject data (e.g., 'NA' in all reward variables)
covid_rew_filtered <- covid_rew_filtered %>%
  # Add a helper column that checks if all values in specified columns are NA
  mutate(all_na = rowSums(is.na(across(all_of(rew_unified_vars)))) == length(rew_unified_vars)) %>%
  # Filter out rows where the helper column is TRUE
  filter(!all_na) %>%
  # Optionally remove the helper column if it's no longer needed
  select(-all_na)

# Create 'sum_rew' and 'num_rew' columns by summing all reward-related columns for each row, ignoring NA values
covid_rew_filtered$sum_rew <- rowSums(covid_rew_filtered[rew_unified_vars], na.rm = TRUE)
covid_rew_filtered$num_rew <- rowSums(!is.na(covid_rew_filtered[rew_unified_vars]))

# Create 'avg_rew_adj' column for reward variables by dividing 'sum_rew' by 'num_rew'
covid_rew_filtered$avg_rew <- covid_rew_filtered$sum_rew / covid_rew_filtered$num_rew

# Merge covid_behavior_filtered with covid_risk_filtered
covid_long_intermediate <- dplyr::left_join(covid_behavior_filtered, covid_risk_filtered, by = c("record_id", "time"))

# Merge the intermediate result with covid_rew_filtered
covid_long_final <- dplyr::left_join(covid_long_intermediate, covid_rew_filtered, by = c("record_id", "time"))

Risk & Reward Data: Stratification & Categorization

Categorizes risk and reward variables into ‘Yes’ and ‘No’ groups and then further stratifies them into ‘Low’, ‘Moderate’, and ‘High’ risk categories based on predefined item numbers. Computes the sum and average scores for each group by risk category, enabling detailed analysis of responses within the stratified groups.

Code
# Split the risk variables into those containing 'Y' and those containing 'N'
risk_Y_unified_vars <- risk_unified_vars[grep("\\dY_", risk_unified_vars)]
risk_N_unified_vars <- risk_unified_vars[grep("\\dN_", risk_unified_vars)]

# Split the reward variables into those containing 'Y' and those containing 'N'
rew_Y_unified_vars <- rew_unified_vars[grep("\\dY_", rew_unified_vars)]
rew_N_unified_vars <- rew_unified_vars[grep("\\dN_", rew_unified_vars)]

# Calculating sum scores for 'yes' and 'no' responses for risk variables
covid_long_final <- covid_long_final %>%
  mutate(
    sum_risk_Y = rowSums(select(., all_of(risk_Y_unified_vars)), na.rm = TRUE),
    sum_risk_N = rowSums(select(., all_of(risk_N_unified_vars)), na.rm = TRUE),
    num_risk_Y = rowSums(!is.na(select(., all_of(risk_Y_unified_vars)))),
    num_risk_N = rowSums(!is.na(select(., all_of(risk_N_unified_vars)))),
    avg_risk_Y = sum_risk_Y / num_risk_Y,
    avg_risk_N = sum_risk_N / num_risk_N
  )

# Calculating sum scores for 'yes' and 'no' responses for reward variables
covid_long_final <- covid_long_final %>%
  mutate(
    sum_rew_Y = rowSums(select(., all_of(rew_Y_unified_vars)), na.rm = TRUE),
    sum_rew_N = rowSums(select(., all_of(rew_N_unified_vars)), na.rm = TRUE),
    num_rew_Y = rowSums(!is.na(select(., all_of(rew_Y_unified_vars)))),
    num_rew_N = rowSums(!is.na(select(., all_of(rew_N_unified_vars)))),
    avg_rew_Y = sum_rew_Y / num_rew_Y,
    avg_rew_N = sum_rew_N / num_rew_N
  )

# Extract item numbers from variable names
item_numbers <- as.numeric(gsub("\\D", "", all_unified_vars))

# Classify each variable into a risk category based on item number
risk_categories <- ifelse(item_numbers %in% 1:8, "Low",
                          ifelse(item_numbers %in% 9:22, "Moderate", "High"))

# Combine variable names with their categories
behavior_categories <- data.frame(variable = all_unified_vars, category = risk_categories)

# Calculating sums and averages for risk categories, 'Yes' and 'No' variables
low_risk_vars <- behavior_categories$variable[behavior_categories$category == "Low"]
moderate_risk_vars <- behavior_categories$variable[behavior_categories$category == "Moderate"]
high_risk_vars <- behavior_categories$variable[behavior_categories$category == "High"]
low_risk_Y_vars <- risk_Y_unified_vars[risk_Y_unified_vars %in% low_risk_vars]
low_risk_N_vars <- risk_N_unified_vars[risk_N_unified_vars %in% low_risk_vars]
moderate_risk_Y_vars <- risk_Y_unified_vars[risk_Y_unified_vars %in% moderate_risk_vars]
moderate_risk_N_vars <- risk_N_unified_vars[risk_N_unified_vars %in% moderate_risk_vars]
high_risk_Y_vars <- risk_Y_unified_vars[risk_Y_unified_vars %in% high_risk_vars]
high_risk_N_vars <- risk_N_unified_vars[risk_N_unified_vars %in% high_risk_vars]

# Calculating sums and averages for reward categories, 'Yes' and 'No' variables
low_rew_vars <- behavior_categories$variable[behavior_categories$category == "Low"]
moderate_rew_vars <- behavior_categories$variable[behavior_categories$category == "Moderate"]
high_rew_vars <- behavior_categories$variable[behavior_categories$category == "High"]
low_rew_Y_vars <- rew_Y_unified_vars[rew_Y_unified_vars %in% low_rew_vars]
low_rew_N_vars <- rew_N_unified_vars[rew_N_unified_vars %in% low_rew_vars]
moderate_rew_Y_vars <- rew_Y_unified_vars[rew_Y_unified_vars %in% moderate_rew_vars]
moderate_rew_N_vars <- rew_N_unified_vars[rew_N_unified_vars %in% moderate_rew_vars]
high_rew_Y_vars <- rew_Y_unified_vars[rew_Y_unified_vars %in% high_rew_vars]
high_rew_N_vars <- rew_N_unified_vars[rew_N_unified_vars %in% high_rew_vars]

# Adding risk calculations to the dataframe
covid_long_final <- covid_long_final %>%
  mutate(
    # Low risk calculations
    sum_low_risk_Y = rowSums(select(., all_of(low_risk_Y_vars)), na.rm = TRUE),
    sum_low_risk_N = rowSums(select(., all_of(low_risk_N_vars)), na.rm = TRUE),
    num_low_risk_Y = rowSums(!is.na(select(., all_of(low_risk_Y_vars)))),
    num_low_risk_N = rowSums(!is.na(select(., all_of(low_risk_N_vars)))),
    avg_low_risk_Y = sum_low_risk_Y / num_low_risk_Y,
    avg_low_risk_N = sum_low_risk_N / num_low_risk_N,

    # Moderate risk calculations
    sum_moderate_risk_Y = rowSums(select(., all_of(moderate_risk_Y_vars)), na.rm = TRUE),
    sum_moderate_risk_N = rowSums(select(., all_of(moderate_risk_N_vars)), na.rm = TRUE),
    num_moderate_risk_Y = rowSums(!is.na(select(., all_of(moderate_risk_Y_vars)))),
    num_moderate_risk_N = rowSums(!is.na(select(., all_of(moderate_risk_N_vars)))),
    avg_moderate_risk_Y = sum_moderate_risk_Y / num_moderate_risk_Y,
    avg_moderate_risk_N = sum_moderate_risk_N / num_moderate_risk_N,

    # High risk calculations
    sum_high_risk_Y = rowSums(select(., all_of(high_risk_Y_vars)), na.rm = TRUE),
    sum_high_risk_N = rowSums(select(., all_of(high_risk_N_vars)), na.rm = TRUE),
    num_high_risk_Y = rowSums(!is.na(select(., all_of(high_risk_Y_vars)))),
    num_high_risk_N = rowSums(!is.na(select(., all_of(high_risk_N_vars)))),
    avg_high_risk_Y = sum_high_risk_Y / num_high_risk_Y,
    avg_high_risk_N = sum_high_risk_N / num_high_risk_N
  )

# Adding reward calculations to the dataframe
covid_long_final <- covid_long_final %>%
  mutate(
    # Low reward calculations
    sum_low_rew_Y = rowSums(select(., all_of(low_rew_Y_vars)), na.rm = TRUE),
    sum_low_rew_N = rowSums(select(., all_of(low_rew_N_vars)), na.rm = TRUE),
    num_low_rew_Y = rowSums(!is.na(select(., all_of(low_rew_Y_vars)))),
    num_low_rew_N = rowSums(!is.na(select(., all_of(low_rew_N_vars)))),
    avg_low_rew_Y = sum_low_rew_Y / num_low_rew_Y,
    avg_low_rew_N = sum_low_rew_N / num_low_rew_N,

    # Moderate reward calculations
    sum_moderate_rew_Y = rowSums(select(., all_of(moderate_rew_Y_vars)), na.rm = TRUE),
    sum_moderate_rew_N = rowSums(select(., all_of(moderate_rew_N_vars)), na.rm = TRUE),
    num_moderate_rew_Y = rowSums(!is.na(select(., all_of(moderate_rew_Y_vars)))),
    num_moderate_rew_N = rowSums(!is.na(select(., all_of(moderate_rew_N_vars)))),
    avg_moderate_rew_Y = sum_moderate_rew_Y / num_moderate_rew_Y,
    avg_moderate_rew_N = sum_moderate_rew_N / num_moderate_rew_N,

    # High reward calculations
    sum_high_rew_Y = rowSums(select(., all_of(high_rew_Y_vars)), na.rm = TRUE),
    sum_high_rew_N = rowSums(select(., all_of(high_rew_N_vars)), na.rm = TRUE),
    num_high_rew_Y = rowSums(!is.na(select(., all_of(high_rew_Y_vars)))),
    num_high_rew_N = rowSums(!is.na(select(., all_of(high_rew_N_vars)))),
    avg_high_rew_Y = sum_high_rew_Y / num_high_rew_Y,
    avg_high_rew_N = sum_high_rew_N / num_high_rew_N
  )

Mental Health Data: Wrangling & Transformations

Converts mental health survey data from wide to long format for analytical flexibility. It cleans and summarizes the data, mapping each variable to its respective time points defined by suffixes, calculating the number of iterations and value ranges for each variable. This process involves removing redundant rows and establishing a clear structure to facilitate longitudinal analyses, incorporating a ‘time_factor’ column to align data points with specific survey phases.

Please note: the mental health data requires a slightly different approach, and thus, comes with its own unique transformations as a result of each variable having different frequencies of occurrence in the data collection sequence.

Code
# Extract the column names
column_names <- names(mentalhealth_survey_data)

# Create a list where each base variable name is mapped to its original column names after removing suffixes
base_variable_names <- str_remove_all(column_names, 
                                      pattern = "_baseline|_two_week|_monthly|_six_week|_second_monthly|_ten_week|_third_monthly")

# Create a unique list of base variable names
unique_base_variables <- unique(base_variable_names)

# Create a data frame to store the summary of each variable
variable_summary <- data.frame(variable = unique_base_variables)

# Calculate the number of iterations and value ranges
variable_summary <- variable_summary %>%
  rowwise() %>%
  mutate(
    iterations = length(grep(paste0("^", variable, "(_|$)"), column_names)),
    value_range = {
      cols <- names(mentalhealth_survey_data)[grep(paste0("^", variable, "(_|$)"), column_names)]
      min_val <- min(unlist(mentalhealth_survey_data[cols], use.names = FALSE), na.rm = TRUE)
      max_val <- max(unlist(mentalhealth_survey_data[cols], use.names = FALSE), na.rm = TRUE)
      paste(min_val, max_val, sep = "-")
    }
  )

# Manual edits to the data frame iteration values to account for variables with similar base names
variable_summary <- variable_summary %>%
  mutate(
    iterations = case_when(
      variable == "ami" ~ "7",
      variable == "bis" ~ "4",
      variable == "dsm_anger" ~ "4",
      variable == "dsm_anhedonia" ~ "4",
      variable == "dsm_anxiety" ~ "4",
      variable == "dsm_depression" ~ "4",
      variable == "dsm_dissociation" ~ "4",
      variable == "dsm_dysphoria" ~ "4",
      variable == "dsm_mania" ~ "4",
      variable == "dsm_memory" ~ "4",
      variable == "dsm_personality" ~ "4",
      variable == "dsm_psychosis" ~ "4",
      variable == "dsm_repetition" ~ "4",
      variable == "dsm_sleep" ~ "4",
      variable == "dsm_somatic" ~ "4",
      variable == "dsm_suicidality" ~ "4",
      variable == "dsm_substance" ~ "4",
      variable == "dsm_total" ~ "4",
      variable == "mpss" ~ "4",
      variable == "panas_positive" ~ "1",
      variable == "panas_negative" ~ "1",
      variable == "promis_pain" ~ "2",
      variable == "promis_physical" ~ "2",
      TRUE ~ as.character(iterations)  # Keep existing iterations for other variables
    )
  ) %>%
  filter(variable != "record_id")  # Remove the row where variable is 'record_id'

# Define time suffixes and factors
time_suffixes <- list(
  `7` = c("_baseline", "_two_week", "_monthly", "_six_week", "_second_monthly", "_ten_week", "_third_monthly"),
  `4` = c("_baseline", "_monthly", "_second_monthly", "_third_monthly"),
  `2` = c("_baseline", "_two_week"),
  `1` = c("_baseline")
)

time_factors <- list(
  `7` = c("Baseline", "Two Week Follow-Up", "One Month Follow-Up", "Six Week Follow-Up", "Two Month Follow-Up", "Ten Week Follow-Up", "Three Month Follow-Up"),
  `4` = c("Baseline", "One Month Follow-Up", "Two Month Follow-Up", "Three Month Follow-Up"),
  `2` = c("Baseline", "Two Week Follow-Up"),
  `1` = c("Baseline")
)

# Transforming mentalhealth_survey_data from wide to long format and creating 'time_factor'
mentalhealth_survey_data_long <- mentalhealth_survey_data %>%
  pivot_longer(
    cols = -record_id,  # Assuming 'record_id' is the identifier and should not be pivoted
    names_to = "variable_time",
    values_to = "value"
  ) %>%
  mutate(
    time = case_when(
      str_detect(variable_time, "_baseline") ~ 0,
      str_detect(variable_time, "_two_week") ~ 1,
      str_detect(variable_time, "_monthly") & !str_detect(variable_time, "second_monthly") & !str_detect(variable_time, "third_monthly") ~ 2,
      str_detect(variable_time, "_six_week") ~ 3,
      str_detect(variable_time, "_second_monthly") ~ 4,
      str_detect(variable_time, "_ten_week") ~ 5,
      str_detect(variable_time, "_third_monthly") ~ 6,
      TRUE ~ as.integer(NA)  # Handling unexpected cases
    ),
    variable = str_remove(variable_time, "_baseline|_two_week|_monthly|_six_week|_second_monthly|_ten_week|_third_monthly"),
    time_factor = factor(
      case_when(
        time == 0 ~ "Baseline",
        time == 1 ~ "Two Week Follow-Up",
        time == 2 ~ "One Month Follow-Up",
        time == 3 ~ "Six Week Follow-Up",
        time == 4 ~ "Two Month Follow-Up",
        time == 5 ~ "Ten Week Follow-Up",
        time == 6 ~ "Three Month Follow-Up",
        TRUE ~ NA_character_  # Handles any unexpected cases
      ),
      levels = c("Baseline", "Two Week Follow-Up", "One Month Follow-Up", "Six Week Follow-Up", "Two Month Follow-Up", "Ten Week Follow-Up", "Three Month Follow-Up")
    )
  ) %>%
  select(record_id, time, time_factor, variable, value)  # Reorder and select only necessary columns

# Omit specific time points for DSM variables
dsm_vars <- c("dsm_anger", "dsm_anhedonia", "dsm_anxiety", "dsm_depression", "dsm_dissociation"
              , "dsm_dysphoria", "dsm_mania", "dsm_memory", "dsm_personality", "dsm_psychosis",
              "dsm_repetition", "dsm_sleep", "dsm_somatic", "dsm_suicidality", "dsm_substance",
              "dsm_total")

omit_times <- c("Two Week Follow-Up", "Six Week Follow-Up", "Ten Week Follow-Up")

# Filter out unwanted time points for DSM variables
mentalhealth_survey_data_long <- mentalhealth_survey_data_long %>%
  filter(!(variable %in% dsm_vars & time_factor %in% omit_times))

# Keep specific time points for PROMIS variables
promis_vars <- c("promis_pain", "promis_physical")
keep_promis_times <- c("Baseline", "Two Week Follow-Up")

# Filter and keep specific time points for PROMIS variables
mentalhealth_survey_data_long <- mentalhealth_survey_data_long %>%
  filter(!(variable %in% promis_vars) | (variable %in% promis_vars & time_factor %in% keep_promis_times))

# Keep specific time points for PANAS variables
panas_vars <- c("panas_positive", "panas_negative")
keep_panas_times <- "Baseline"

# Filter and keep specific time points for PANAS variables
mentalhealth_survey_data_long <- mentalhealth_survey_data_long %>%
  filter(!(variable %in% panas_vars) | (variable %in% panas_vars & time_factor == keep_panas_times))

Descriptive Statistics & Visualizations

Please use the data frame ‘covid_long_final’ for any descriptive statistics calculations or visualizations pertaining to behavior, risk, &/or reward data. Note: These data frames are already in long format.

Extracts descriptive statistics for behavior, risk, reward, and mental health data across different time points. Outputs are then converted into a consolidated data frame, reformatted for improved readability, and displayed as a multi-tiered HTML table with headers representing each time point.

Code
# Reorder columns in final data frame to verify baseline completion dates match
covid_long_final <- covid_long_final %>%
  select(record_id, time, baseline_date_complete, baseline_date_complete.x, baseline_date_complete.y, everything())

# Following verification, remove the duplicate columns
covid_long_final <- covid_long_final %>%
  select(-baseline_date_complete.x, -baseline_date_complete.y)

# Add new column assigning group label based on baseline completion date and ensure it's a factor
covid_long_final <- covid_long_final %>%
  mutate(baseline_date_complete = mdy(baseline_date_complete),  # Convert to Date format if not already
         covid_group = case_when(
           baseline_date_complete >= mdy("05/01/2020") & baseline_date_complete <= mdy("11/30/2020") ~ "first_wave",
           baseline_date_complete >= mdy("12/01/2020") & baseline_date_complete <= mdy("2/28/2021") ~ "second_wave",
           TRUE ~ NA_character_  # for dates outside the range or NA
         ),
         covid_group = factor(covid_group, levels = c("first_wave", "second_wave"))) %>%
  relocate(covid_group, .after = baseline_date_complete)  # Move covid_group right after baseline_date_complete

# Function to convert describeBy output to data frame
describe_to_df <- function(describe_obj) {
  bind_rows(lapply(describe_obj, as.data.frame))
}

# Generate describeBy outputs
describe_sum_risk_Y <- describeBy(covid_long_final$sum_risk_Y, group = covid_long_final$time)
describe_sum_risk_N <- describeBy(covid_long_final$sum_risk_N, group = covid_long_final$time)
describe_avg_risk_Y <- describeBy(covid_long_final$avg_risk_Y, group = covid_long_final$time)
describe_avg_risk_N <- describeBy(covid_long_final$avg_risk_N, group = covid_long_final$time)
describe_sum_low_risk_Y <- describeBy(covid_long_final$sum_low_risk_Y, group = covid_long_final$time)
describe_sum_low_risk_N <- describeBy(covid_long_final$sum_low_risk_N, group = covid_long_final$time)
describe_avg_low_risk_Y <- describeBy(covid_long_final$avg_low_risk_Y, group = covid_long_final$time)
describe_avg_low_risk_N <- describeBy(covid_long_final$avg_low_risk_N, group = covid_long_final$time)
describe_sum_moderate_risk_Y <- describeBy(covid_long_final$sum_moderate_risk_Y, group = covid_long_final$time)
describe_sum_moderate_risk_N <- describeBy(covid_long_final$sum_moderate_risk_N, group = covid_long_final$time)
describe_avg_moderate_risk_Y <- describeBy(covid_long_final$avg_moderate_risk_Y, group = covid_long_final$time)
describe_avg_moderate_risk_N <- describeBy(covid_long_final$avg_moderate_risk_N, group = covid_long_final$time)
describe_sum_high_risk_Y <- describeBy(covid_long_final$sum_high_risk_Y, group = covid_long_final$time)
describe_sum_high_risk_N <- describeBy(covid_long_final$sum_high_risk_N, group = covid_long_final$time)
describe_avg_high_risk_Y <- describeBy(covid_long_final$avg_high_risk_Y, group = covid_long_final$time)
describe_avg_high_risk_N <- describeBy(covid_long_final$avg_high_risk_N, group = covid_long_final$time)
describe_sum_rew_Y <- describeBy(covid_long_final$sum_rew_Y, group = covid_long_final$time)
describe_sum_rew_N <- describeBy(covid_long_final$sum_rew_N, group = covid_long_final$time)
describe_avg_rew_Y <- describeBy(covid_long_final$avg_rew_Y, group = covid_long_final$time)
describe_avg_rew_N <- describeBy(covid_long_final$avg_rew_N, group = covid_long_final$time)
describe_sum_low_rew_Y <- describeBy(covid_long_final$sum_low_rew_Y, group = covid_long_final$time)
describe_sum_low_rew_N <- describeBy(covid_long_final$sum_low_rew_N, group = covid_long_final$time)
describe_avg_low_rew_Y <- describeBy(covid_long_final$avg_low_rew_Y, group = covid_long_final$time)
describe_avg_low_rew_N <- describeBy(covid_long_final$avg_low_rew_N, group = covid_long_final$time)
describe_sum_moderate_rew_Y <- describeBy(covid_long_final$sum_moderate_rew_Y, group = covid_long_final$time)
describe_sum_moderate_rew_N <- describeBy(covid_long_final$sum_moderate_rew_N, group = covid_long_final$time)
describe_avg_moderate_rew_Y <- describeBy(covid_long_final$avg_moderate_rew_Y, group = covid_long_final$time)
describe_avg_moderate_rew_N <- describeBy(covid_long_final$avg_moderate_rew_N, group = covid_long_final$time)
describe_sum_high_rew_Y <- describeBy(covid_long_final$sum_high_rew_Y, group = covid_long_final$time)
describe_sum_high_rew_N <- describeBy(covid_long_final$sum_high_rew_N, group = covid_long_final$time)
describe_avg_high_rew_Y <- describeBy(covid_long_final$avg_high_rew_Y, group = covid_long_final$time)
describe_avg_high_rew_N <- describeBy(covid_long_final$avg_high_rew_N, group = covid_long_final$time)

# Convert each describeBy output to data frame
df_sum_risk_Y <- describe_to_df(describe_sum_risk_Y)
df_sum_risk_N <- describe_to_df(describe_sum_risk_N)
df_avg_risk_Y <- describe_to_df(describe_avg_risk_Y)
df_avg_risk_N <- describe_to_df(describe_avg_risk_N)
df_sum_low_risk_Y <- describe_to_df(describe_sum_low_risk_Y)
df_sum_low_risk_N <- describe_to_df(describe_sum_low_risk_N)
df_avg_low_risk_Y <- describe_to_df(describe_avg_low_risk_Y)
df_avg_low_risk_N <- describe_to_df(describe_avg_low_risk_N)
df_sum_moderate_risk_Y <- describe_to_df(describe_sum_moderate_risk_Y)
df_sum_moderate_risk_N <- describe_to_df(describe_sum_moderate_risk_N)
df_avg_moderate_risk_Y <- describe_to_df(describe_avg_moderate_risk_Y)
df_avg_moderate_risk_N <- describe_to_df(describe_avg_moderate_risk_N)
df_sum_high_risk_Y <- describe_to_df(describe_sum_high_risk_Y)
df_sum_high_risk_N <- describe_to_df(describe_sum_high_risk_N)
df_avg_high_risk_Y <- describe_to_df(describe_avg_high_risk_Y)
df_avg_high_risk_N <- describe_to_df(describe_avg_high_risk_N)
df_sum_rew_Y <- describe_to_df(describe_sum_rew_Y)
df_sum_rew_N <- describe_to_df(describe_sum_rew_N)
df_avg_rew_Y <- describe_to_df(describe_avg_rew_Y)
df_avg_rew_N <- describe_to_df(describe_avg_rew_N)
df_sum_low_rew_Y <- describe_to_df(describe_sum_low_rew_Y)
df_sum_low_rew_N <- describe_to_df(describe_sum_low_rew_N)
df_avg_low_rew_Y <- describe_to_df(describe_avg_low_rew_Y)
df_avg_low_rew_N <- describe_to_df(describe_avg_low_rew_N)
df_sum_moderate_rew_Y <- describe_to_df(describe_sum_moderate_rew_Y)
df_sum_moderate_rew_N <- describe_to_df(describe_sum_moderate_rew_N)
df_avg_moderate_rew_Y <- describe_to_df(describe_avg_moderate_rew_Y)
df_avg_moderate_rew_N <- describe_to_df(describe_avg_moderate_rew_N)
df_sum_high_rew_Y <- describe_to_df(describe_sum_high_rew_Y)
df_sum_high_rew_N <- describe_to_df(describe_sum_high_rew_N)
df_avg_high_rew_Y <- describe_to_df(describe_avg_high_rew_Y)
df_avg_high_rew_N <- describe_to_df(describe_avg_high_rew_N)

# Combine all data frames into one
combined_df <- bind_rows(
  df_sum_risk_Y %>% mutate(variable = "sum_risk_Y"),
  df_sum_risk_N %>% mutate(variable = "sum_risk_N"),
  df_avg_risk_Y %>% mutate(variable = "avg_risk_Y"),
  df_avg_risk_N %>% mutate(variable = "avg_risk_N"),
  df_sum_low_risk_Y %>% mutate(variable = "sum_low_risk_Y"),
  df_sum_low_risk_N %>% mutate(variable = "sum_low_risk_N"),
  df_avg_low_risk_Y %>% mutate(variable = "avg_low_risk_Y"),
  df_avg_low_risk_N %>% mutate(variable = "avg_low_risk_N"),
  df_sum_moderate_risk_Y %>% mutate(variable = "sum_moderate_risk_Y"),
  df_sum_moderate_risk_N %>% mutate(variable = "sum_moderate_risk_N"),
  df_avg_moderate_risk_Y %>% mutate(variable = "avg_moderate_risk_Y"),
  df_avg_moderate_risk_N %>% mutate(variable = "avg_moderate_risk_N"),
  df_sum_high_risk_Y %>% mutate(variable = "sum_high_risk_Y"),
  df_sum_high_risk_N %>% mutate(variable = "sum_high_risk_N"),
  df_avg_high_risk_Y %>% mutate(variable = "avg_high_risk_Y"),
  df_avg_high_risk_N %>% mutate(variable = "avg_high_risk_N"),
  df_sum_rew_Y %>% mutate(variable = "sum_rew_Y"),
  df_sum_rew_N %>% mutate(variable = "sum_rew_N"),
  df_avg_rew_Y %>% mutate(variable = "avg_rew_Y"),
  df_avg_rew_N %>% mutate(variable = "avg_rew_N"),
  df_sum_low_rew_Y %>% mutate(variable = "sum_low_rew_Y"),
  df_sum_low_rew_N %>% mutate(variable = "sum_low_rew_N"),
  df_avg_low_rew_Y %>% mutate(variable = "avg_low_rew_Y"),
  df_avg_low_rew_N %>% mutate(variable = "avg_low_rew_N"),
  df_sum_moderate_rew_Y %>% mutate(variable = "sum_moderate_rew_Y"),
  df_sum_moderate_rew_N %>% mutate(variable = "sum_moderate_rew_N"),
  df_avg_moderate_rew_Y %>% mutate(variable = "avg_moderate_rew_Y"),
  df_avg_moderate_rew_N %>% mutate(variable = "avg_moderate_rew_N"),
  df_sum_high_rew_Y %>% mutate(variable = "sum_high_rew_Y"),
  df_sum_high_rew_N %>% mutate(variable = "sum_high_rew_N"),
  df_avg_high_rew_Y %>% mutate(variable = "avg_high_rew_Y"),
  df_avg_high_rew_N %>% mutate(variable = "avg_high_rew_N")
)

# Rename 'vars' to 'time'
combined_df <- combined_df %>%
  dplyr::rename(time = vars)

# Recode the 'time' column to have a sequence from 0 to 6 for each set of variables
combined_df <- combined_df %>%
  dplyr::mutate(time = rep(0:6, length.out = n()))

# Spread the combined data frame for a nicer format
final_table <- combined_df %>%
  select(variable, time, n, mean, sd, median, min, max)

# Pivot the data to a wider format with a specific order for statistics
wide_df <- final_table %>%
  pivot_wider(
    names_from = time,  # Assuming 'time' column is correctly set as a factor or numeric already
    values_from = c(n, mean, sd, median, min, max),
    names_sep = "_"  # Creating column names like mean_0, sd_0, etc.
  ) %>%
  select(variable,
         starts_with("n"), starts_with("mean"), starts_with("sd"), starts_with("median"),
         starts_with("min"), starts_with("max"))

# Generate the ordered column names dynamically based on the desired pattern
stats_order <- c("n", "mean", "sd", "median", "min", "max")
time_points <- 0:6  # Assuming time points are from 0 to 6
ordered_column_names <- unlist(lapply(time_points, function(t) paste(stats_order, t, sep = "_")))

# Select columns in the desired order
wide_df <- wide_df %>%
  select(variable, all_of(ordered_column_names))

# Now, strip the time suffixes from the column names for display purposes only
clean_column_names <- rep(stats_order, times = length(time_points))

# Rename the columns for display
names(wide_df)[-1] <- clean_column_names  # Excluding the first column which is 'variable'

# Create the formatted table with specified header spans
formatted_table <- kable(wide_df, "html", escape = FALSE) %>%
  kable_styling(full_width = FALSE, position = "left") %>%
  add_header_above(c(" " = 1, 
                     "Baseline" = length(stats_order), 
                     "Two Week Follow-Up" = length(stats_order), 
                     "One Month Follow-Up" = length(stats_order), 
                     "Six Week Follow-Up" = length(stats_order), 
                     "Two Month Follow-Up" = length(stats_order), 
                     "Ten Week Follow-Up" = length(stats_order), 
                     "Three Month Follow-Up" = length(stats_order))) %>%
  column_spec(1, bold = TRUE, border_right = TRUE) %>%
  scroll_box(width = "100%", height = "500px")

# Print the formatted table
formatted_table
Baseline
Two Week Follow-Up
One Month Follow-Up
Six Week Follow-Up
Two Month Follow-Up
Ten Week Follow-Up
Three Month Follow-Up
variable n mean sd median min max n mean sd median min max n mean sd median min max n mean sd median min max n mean sd median min max n mean sd median min max n mean sd median min max
sum_risk_Y 1485 274.23030 226.28174 227.00000 0 2444 1018 237.40864 193.01048 197.50000 0 1956.00000 895 227.56313 192.02669 190.00000 0 1850.000 861 229.06156 209.19431 192.00000 0 2806 747 224.81660 196.28454 185.00000 0 1702 726 224.40909 200.20983 183.50000 0 2295 706 227.97309 205.08190 194.00000 0 2545.00000
sum_risk_N 1485 1000.81414 839.38860 934.00000 0 3024 1018 991.19548 829.03130 898.50000 0 3015.00000 895 890.02793 802.51098 761.00000 0 3223.000 861 822.31010 779.99528 662.00000 0 3273 747 789.20080 771.85539 608.00000 0 3187 726 732.05096 740.96750 508.50000 0 3238 706 742.62890 729.84526 572.50000 0 3286.00000
avg_risk_Y 1477 36.73360 18.31518 36.44444 0 100 1016 36.80283 18.46420 36.77083 0 96.33333 892 36.52524 18.20358 36.46429 0 99.625 856 37.08307 19.07084 37.63333 0 100 740 36.93710 18.51180 36.26786 0 100 720 36.03884 18.28349 35.55000 0 100 703 35.60483 18.29127 35.25000 0 100.00000
avg_risk_N 1294 59.61682 21.58662 63.66026 0 100 882 60.10240 21.35542 64.00000 0 100.00000 737 60.81560 20.81322 64.25000 0 100.000 708 59.50904 20.68344 62.98214 0 100 603 59.71899 20.73033 62.57692 0 100 582 59.18311 21.25296 63.00000 0 100 579 58.28659 20.71050 61.23529 0 100.00000
sum_low_risk_Y 1485 115.84983 74.44552 108.00000 0 480 1018 112.51081 72.42132 105.00000 0 477.00000 895 109.46369 74.00867 100.00000 0 497.000 861 112.24274 76.20841 100.00000 0 635 747 109.58635 75.76697 100.00000 0 500 726 106.90083 73.09543 95.00000 0 523 706 107.61331 74.44842 96.50000 0 578.00000
sum_low_risk_N 1485 71.12458 85.35081 42.00000 0 485 1018 70.75442 83.67033 43.50000 0 490.00000 895 62.70168 81.42799 28.00000 0 501.000 861 61.60046 83.28922 26.00000 0 414 747 60.53414 85.17322 25.00000 0 487 726 54.14050 78.67375 20.50000 0 566 706 53.03824 75.87894 24.00000 0 445.00000
avg_low_risk_Y 1470 30.61031 18.16080 29.00000 0 100 1012 31.43378 18.16988 30.40000 0 96.50000 889 30.97208 18.13344 29.75000 0 100.000 852 32.52859 19.06190 30.90000 0 100 739 32.05328 18.61163 31.33333 0 100 718 31.45945 18.36048 29.58333 0 100 699 30.75851 18.42192 29.50000 0 100.00000
avg_low_risk_N 1093 31.68997 21.64997 29.33333 0 100 721 33.10412 21.31875 31.42857 0 100.00000 601 32.95518 21.99320 30.00000 0 100.000 552 33.92596 21.23626 32.00000 0 96 479 34.56290 22.24221 32.00000 0 100 451 34.42303 22.23452 32.00000 0 100 446 33.45676 21.39004 31.50000 0 100.00000
sum_moderate_risk_Y 1485 97.74411 110.16919 68.00000 0 1040 1018 75.32024 91.94670 50.00000 0 725.00000 895 72.56648 89.56590 50.00000 0 700.000 861 70.15215 93.06110 50.00000 0 1025 747 69.13922 91.70351 50.00000 0 800 726 69.69835 93.78388 50.00000 0 991 706 69.74788 90.68383 50.00000 0 934.00000
sum_moderate_risk_N 1485 378.58249 331.06133 350.00000 0 1272 1018 377.79175 331.78267 348.50000 0 1292.00000 895 336.00000 317.36767 270.00000 0 1373.000 861 313.29268 307.91660 243.00000 0 1400 747 302.49398 306.97741 227.00000 0 1345 726 275.56474 290.90763 178.50000 0 1355 706 284.75354 292.56185 210.00000 0 1344.00000
avg_moderate_risk_Y 1208 44.36928 23.00680 46.58333 0 100 764 43.93974 23.38548 46.58333 0 100.00000 664 45.08922 23.43286 47.00000 0 100.000 624 44.65850 23.38186 50.00000 0 100 535 44.20987 23.51523 46.75000 0 100 520 42.89716 22.91906 43.25000 0 100 531 42.05401 22.84972 43.50000 0 100.00000
avg_moderate_risk_N 1145 57.57800 20.66400 59.90000 0 100 781 58.15952 20.01155 60.50000 0 100.00000 662 58.01828 20.15197 60.30303 0 100.000 624 57.88082 19.80410 59.87500 0 100 529 57.69474 19.90693 59.88889 0 100 498 56.71318 20.18615 59.31667 0 100 505 55.34864 20.37588 57.14286 0 100.00000
sum_high_risk_Y 1485 60.63636 101.33752 22.00000 0 1202 1018 49.57760 86.36266 0.00000 0 1253.00000 895 45.53296 82.13051 0.00000 0 1050.000 861 46.66667 89.80994 0.00000 0 1146 747 46.09103 81.16342 0.00000 0 908 726 47.80992 87.65141 9.50000 0 1186 706 50.61190 90.40760 15.50000 0 1033.00000
sum_high_risk_N 1485 551.10707 467.38918 500.00000 0 1500 1018 542.64931 457.50745 489.50000 0 1500.00000 895 491.32626 443.05528 401.00000 0 1499.000 861 447.41696 428.28606 353.00000 0 1500 747 426.17269 421.70884 306.00000 0 1494 726 402.34573 413.47890 269.50000 0 1486 706 404.83711 404.64012 282.00000 0 1500.00000
avg_high_risk_Y 860 48.42007 25.44029 50.00000 0 100 526 50.22572 25.77831 50.00000 0 100.00000 440 49.45865 25.92063 50.00000 0 100.000 429 48.20887 25.36797 50.00000 0 100 379 47.10498 25.71907 50.00000 0 100 384 47.95643 24.95251 50.00000 0 100 386 47.47601 24.21687 50.00000 0 100.00000
avg_high_risk_N 1220 71.42816 22.65419 77.65476 0 100 820 73.02766 20.17925 77.42262 0 100.00000 690 72.85361 20.21446 76.60769 0 100.000 652 71.69236 20.76560 75.90417 0 100 558 71.36472 21.27820 77.00000 0 100 532 70.46809 21.70508 76.20202 0 100 522 69.95070 20.66678 75.00000 0 100.00000
sum_rew_Y 1485 417.17643 309.38861 344.00000 0 2917 1018 354.51277 261.31355 294.00000 0 2072.00000 895 346.74078 255.73152 294.00000 0 2101.000 861 334.40418 265.38334 280.00000 0 2861 747 332.93708 255.25434 279.00000 0 2000 726 345.73416 258.41634 291.00000 0 1900 706 363.89660 278.05688 303.50000 0 2495.00000
sum_rew_N 1485 818.99394 652.85004 801.00000 0 3000 1018 808.34872 654.12223 769.00000 0 2974.00000 895 743.37095 652.22608 669.00000 0 2642.000 861 704.28339 637.62167 637.00000 0 2591 747 668.05622 631.38072 547.00000 0 2846 726 636.16529 617.65120 484.00000 0 3063 706 662.05382 621.64123 530.00000 0 3097.00000
avg_rew_Y 1477 52.59260 17.53307 53.00000 0 100 1016 52.52020 17.84096 52.73214 0 100.00000 892 53.09423 18.55131 54.41667 0 100.000 856 51.93917 18.31702 52.85714 0 100 740 52.50050 18.50441 52.87302 0 100 720 53.30946 18.25996 53.73214 0 100 703 53.84147 18.06145 55.28571 0 99.91667
avg_rew_N 1294 51.63240 19.50755 52.24444 0 100 882 53.22360 20.10460 54.57419 0 100.00000 737 53.14159 19.79110 53.12500 0 100.000 708 54.42282 19.66078 53.94074 0 100 603 53.28601 20.00794 54.25000 0 100 582 54.92480 20.11108 54.48693 0 100 579 55.44498 19.76837 56.53333 0 100.00000
sum_low_rew_Y 1485 186.58855 94.92209 185.00000 0 584 1018 177.22986 92.02859 171.00000 0 471.00000 895 176.95084 93.95841 172.00000 0 500.000 861 170.57724 93.32347 165.00000 0 608 747 168.36546 95.91654 161.00000 0 550 726 172.25069 96.64543 164.00000 0 540 706 177.13031 93.59152 176.00000 0 593.00000
sum_low_rew_N 1485 103.73939 97.08647 89.00000 0 499 1018 102.99902 99.28965 89.00000 0 521.00000 895 91.96983 96.29504 74.00000 0 460.000 861 86.50290 96.75389 55.00000 0 474 747 83.38688 94.52931 56.00000 0 443 726 76.81405 90.86771 50.00000 0 505 706 79.29320 92.03157 55.00000 0 649.00000
avg_low_rew_Y 1470 47.46660 18.41246 48.63333 0 100 1012 48.26785 18.65756 49.90000 0 100.00000 889 48.94311 19.39125 50.00000 0 100.000 852 48.08872 19.16633 49.70833 0 100 739 48.49829 19.61240 50.00000 0 100 718 48.70778 19.21301 50.00000 0 100 699 49.25140 18.70270 50.00000 0 100.00000
avg_low_rew_N 1093 50.57158 23.47167 50.00000 0 100 721 52.34495 23.44032 50.00000 0 100.00000 601 51.96558 23.47094 50.00000 0 100.000 552 51.31838 23.42469 50.00000 0 100 479 52.61574 23.68327 50.00000 0 100 451 52.34428 24.64108 50.00000 0 100 446 53.73931 24.39009 50.50000 0 100.00000
sum_moderate_rew_Y 1485 139.56498 156.70092 96.00000 0 1300 1018 103.80354 129.24249 74.50000 0 1100.00000 895 100.68603 124.45433 69.00000 0 900.000 861 93.62253 121.10001 63.00000 0 1218 747 92.88086 122.66730 60.00000 0 1100 726 100.26309 124.30940 71.50000 0 1000 706 104.00708 134.66831 67.00000 0 1201.00000
sum_moderate_rew_N 1485 338.56094 279.05377 331.00000 0 1200 1018 332.77210 281.30000 313.50000 0 1285.00000 895 306.71285 279.63331 273.00000 0 1137.000 861 293.54472 273.61377 265.00000 0 1166 747 278.57430 269.34730 230.00000 0 1132 726 266.02893 269.65763 196.00000 0 1248 706 280.77195 273.18885 224.00000 0 1252.00000
avg_moderate_rew_Y 1208 56.33394 24.69142 59.16667 0 100 765 55.23049 25.65646 56.25000 0 100.00000 664 57.47821 26.85768 60.70833 0 100.000 624 54.98012 25.46704 55.50000 0 100 535 54.89870 25.59349 57.00000 0 100 520 58.57071 24.95060 59.50000 0 100 531 56.92545 25.33919 60.00000 0 100.00000
avg_moderate_rew_N 1145 53.49429 19.10508 53.83333 0 100 781 53.76948 19.41197 54.60000 0 100.00000 662 53.85156 19.17972 55.00000 0 100.000 624 55.12790 18.77903 55.14835 0 100 529 54.55982 19.32868 55.00000 0 100 498 55.85298 19.83728 56.74603 0 100 505 56.16582 20.09125 58.18182 0 100.00000
sum_high_rew_Y 1485 91.02290 129.12331 50.00000 0 1271 1018 73.47937 112.77948 20.00000 0 1128.00000 895 69.10391 105.89205 0.00000 0 859.000 861 70.20441 115.35125 0.00000 0 1128 747 71.69076 105.32182 9.00000 0 719 726 73.22039 105.65961 50.00000 0 863 706 82.75921 119.34784 50.00000 0 1019.00000
sum_high_rew_N 1485 376.69360 324.46414 325.00000 0 1400 1018 372.57760 320.86933 344.00000 0 1241.00000 895 344.68827 317.70390 294.00000 0 1271.000 861 324.23577 310.82210 255.00000 0 1383 747 306.09505 308.19791 216.00000 0 1351 726 293.32231 298.35440 195.00000 0 1344 706 301.98867 301.31752 200.00000 0 1362.00000
avg_high_rew_Y 860 70.16686 20.85981 73.50000 0 100 526 70.62167 22.35507 74.00000 0 100.00000 440 71.79139 21.91000 75.00000 0 100.000 429 70.08123 22.06468 75.00000 0 100 379 71.22157 21.67599 75.00000 0 100 384 71.86266 20.56741 75.00000 0 100 386 74.82674 20.54987 78.46429 0 100.00000
avg_high_rew_N 1220 51.44062 22.49346 52.00000 0 100 820 53.23718 22.46627 54.65476 0 100.00000 690 54.05226 23.21990 54.47222 0 100.000 652 55.36304 22.35525 55.10238 0 100 558 54.86194 23.09543 57.20000 0 100 532 55.77056 22.13105 56.57500 0 100 522 56.01724 21.97305 57.45833 0 100.00000
Code
# Map from time factors to numerical values
time_mapping <- c('Baseline' = 0,
                  'Two Week Follow-Up' = 1,
                  'One Month Follow-Up' = 2,
                  'Six Week Follow-Up' = 3,
                  'Two Month Follow-Up' = 4,
                  'Ten Week Follow-Up' = 5,
                  'Three Month Follow-Up' = 6)

# Store results in a list
summary_list <- list()

# Loop over each variable in the variable_summary data frame
for (i in 1:nrow(variable_summary)) {
  variable_name <- variable_summary$variable[i]
  iterations <- as.character(variable_summary$iterations[i])
  
  # Get the appropriate suffixes and time factors for the current number of iterations
  suffixes <- time_suffixes[[iterations]]
  factors <- time_factors[[iterations]]
  
  # Create a data frame to store summaries for the current variable
  variable_data <- list()
  
  # Calculate statistics for each suffix
  for (j in seq_along(suffixes)) {
    full_var_name <- paste0(variable_name, suffixes[j])
    # Check if the column exists in the data, to avoid errors
    if (full_var_name %in% names(mentalhealth_survey_data)) {
      temp_data <- mentalhealth_survey_data %>%
        dplyr::summarise(
          score_mean = mean(.data[[full_var_name]], na.rm = TRUE),
          n = sum(!is.na(.data[[full_var_name]])),  # Sample size excluding NAs
          sem = sd(.data[[full_var_name]], na.rm = TRUE) / sqrt(sum(!is.na(.data[[full_var_name]]))),
          .groups = 'drop'
        ) %>%
        mutate(
          time_factor = factors[j],
          time = time_mapping[[factors[j]]]  # Assign numerical time based on factor
        )
      
      variable_data[[j]] <- temp_data
    }
  }
  
  # Combine all time point data into one data frame for the current variable
  variable_data <- bind_rows(variable_data)
  summary_list[[variable_name]] <- variable_data
}

# Optionally, if you want to save each data frame separately or operate further
for (name in names(summary_list)) {
  assign(paste0("filtered_", name, "_summary"), summary_list[[name]], envir = .GlobalEnv)
}

Visual representations are crafted for average risk and reward scores using box plots and density ridge plots, segregated by time factors. This section meticulously prepares the data descriptives, culminating in a series of raincloud plots that highlight temporal variations in the dataset.

Code
# Prepare the time factor
covid_long_final <- covid_long_final %>%
  mutate(
    time_factor = factor(time, levels = 0:6, labels = c(
      "Baseline", "Two Week Follow-Up", "One Month Follow-Up", 
      "Six Week Follow-Up", "Two Month Follow-Up", "Ten Week Follow-Up", 
      "Three Month Follow-Up"
    ))
  )

# Creating the density ridge plot for average 'yes' risk scores
p1 <- ggplot(covid_long_final, aes(x = avg_risk_Y, y = time_factor, fill = time_factor)) +
  ggridges::geom_density_ridges(alpha = 0.5, jittered_points = TRUE, point_alpha = 0.5, size = 0.25, point_shape = 20, aes(color = time_factor)) +
  labs(x = "Avg 'Yes' Risk Score", y = '') +
  guides(fill = FALSE, color = FALSE) +
  theme_minimal() +
  xlim(0, 100)

# Creating the density ridge plot for average 'no' risk scores
p2 <- ggplot(covid_long_final, aes(x = avg_risk_N, y = time_factor, fill = time_factor)) +
  ggridges::geom_density_ridges(alpha = 0.5, jittered_points = TRUE, point_alpha = 0.5, size = 0.25, point_shape = 20, aes(color = time_factor)) +
  labs(x = "Avg 'No' Risk Score", y = '') +
  guides(fill = FALSE, color = FALSE) +
  theme_minimal() +
  xlim(0, 100)

# Display the density plots
grid.arrange(p1, p2, ncol = 2)

Code
# Creating the density ridge plot for average 'yes' reward scores
p3 <- ggplot(covid_long_final, aes(x = avg_rew_Y, y = time_factor, fill = time_factor)) +
  ggridges::geom_density_ridges(alpha = 0.5, jittered_points = TRUE, point_alpha = 0.5, size = 0.25, point_shape = 20, aes(color = time_factor)) +
  labs(x = "Avg 'Yes' Reward Score", y = '') +
  guides(fill = FALSE, color = FALSE) +
  theme_minimal() +
  xlim(0, 100)

# Creating the density ridge plot for average 'no' reward scores
p4 <- ggplot(covid_long_final, aes(x = avg_rew_N, y = time_factor, fill = time_factor)) +
  ggridges::geom_density_ridges(alpha = 0.5, jittered_points = TRUE, point_alpha = 0.5, size = 0.25, point_shape = 20, aes(color = time_factor)) +
  labs(x = "Avg 'No' Reward Score", y = '') +
  guides(fill = FALSE, color = FALSE) +
  theme_minimal() +
  xlim(0, 100)

# Display the density plots
grid.arrange(p3, p4, ncol = 2)

Code
# Create the summary avg risk 'Y' data frame
filtered_AvgRiskY_summary <- covid_long_final %>%
  group_by(time_factor) %>%
  dplyr::summarise(
    score_mean = mean(avg_risk_Y, na.rm = TRUE),
    n = n(), # Sample size for each group
    sem = sd(avg_risk_Y, na.rm = TRUE) / sqrt(n()),
    .groups = 'drop'
  )

# Create the summary avg risk 'N' data frame
filtered_AvgRiskN_summary <- covid_long_final %>%
  group_by(time_factor) %>%
  dplyr::summarise(
    score_mean = mean(avg_risk_N, na.rm = TRUE),
    n = n(), # Sample size for each group
    sem = sd(avg_risk_N, na.rm = TRUE) / sqrt(n()),
    .groups = 'drop'
  )

# Create the summary avg reward 'Y' data frame
filtered_AvgRewY_summary <- covid_long_final %>%
  group_by(time_factor) %>%
  dplyr::summarise(
    score_mean = mean(avg_rew_Y, na.rm = TRUE),
    n = n(), # Sample size for each group
    sem = sd(avg_rew_Y, na.rm = TRUE) / sqrt(n()),
    .groups = 'drop'
  )

# Create the summary avg reward 'N' data frame
filtered_AvgRewN_summary <- covid_long_final %>%
  group_by(time_factor) %>%
  dplyr::summarise(
    score_mean = mean(avg_rew_N, na.rm = TRUE),
    n = n(), # Sample size for each group
    sem = sd(avg_rew_N, na.rm = TRUE) / sqrt(n()),
    .groups = 'drop'
  )

# Create a avg 'yes' risk raincloud plot
p5 <- ggplot(covid_long_final, aes(x = time_factor, y = avg_risk_Y, fill = time_factor, color = time_factor)) +
  geom_flat_violin(adjust = 1.5, trim = FALSE, alpha = 0.5, position = position_nudge(x = 0.2, y = 0), colour = NA) +
  geom_point(aes(x = as.numeric(time_factor)-0.25, y = avg_risk_Y, colour = time_factor), position = position_jitter(width = .05), size = .5, shape = 20) +
  geom_boxplot(outlier.shape = NA, alpha = 0.5, width = 0.25, position = position_dodge(width = 0.3), colour = "black") +
  geom_point(data = filtered_AvgRiskY_summary, aes(x = factor(time_factor), y = score_mean), shape = 18, position = position_nudge(x = 0.2)) +
  geom_errorbar(data = filtered_AvgRiskY_summary, aes(x = factor(time_factor), y = score_mean, ymin = score_mean - sem, ymax = score_mean + sem), width = 0.05, position = position_nudge(x = 0.2)) +
  labs(
    title = "Avg 'Yes' Risk Scores by Time Point", 
    y = "Score", 
    x = "Time Point", 
    fill = "Time Point",  # Legend title for fill
    color = "Time Point"  # Legend title for color
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"), # Make legend title bold
    axis.text.x = element_text(angle = 45, hjust = 1)  # Rotate x-axis labels by 45 degrees
  ) +
  ylim(-5, 105)

print(p5)

Code
# Create a avg 'no' risk raincloud plot
p6 <- ggplot(covid_long_final, aes(x = time_factor, y = avg_risk_N, fill = time_factor, color = time_factor)) +
  geom_flat_violin(adjust = 1.5, trim = FALSE, alpha = 0.5, position = position_nudge(x = 0.2, y = 0), colour = NA) +
  geom_point(aes(x = as.numeric(time_factor)-0.25, y = avg_risk_N, colour = time_factor), position = position_jitter(width = .05), size = .5, shape = 20) +
  geom_boxplot(outlier.shape = NA, alpha = 0.5, width = 0.25, position = position_dodge(width = 0.3), colour = "black") +
  geom_point(data = filtered_AvgRiskN_summary, aes(x = factor(time_factor), y = score_mean), shape = 18, position = position_nudge(x = 0.2)) +
  geom_errorbar(data = filtered_AvgRiskN_summary, aes(x = factor(time_factor), y = score_mean, ymin = score_mean - sem, ymax = score_mean + sem), width = 0.05, position = position_nudge(x = 0.2)) +
  labs(
    title = "Avg 'No' Risk Scores by Time Point", 
    y = "Score", 
    x = "Time Point", 
    fill = "Time Point",  # Legend title for fill
    color = "Time Point"  # Legend title for color
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"), # Make legend title bold
    axis.text.x = element_text(angle = 45, hjust = 1)  # Rotate x-axis labels by 45 degrees
  ) +
  ylim(-5, 105)

print(p6)

Code
# Combine raincloud plots and place the legend on the right
combined_plot56 <- p5 + p6 + plot_layout(guides = 'collect') &
  theme(legend.position = 'right')

# Print the combined plot
print(combined_plot56)

Code
# Create a avg 'yes' reward raincloud plot
p7 <- ggplot(covid_long_final, aes(x = time_factor, y = avg_rew_Y, fill = time_factor, color = time_factor)) +
  geom_flat_violin(adjust = 1.5, trim = FALSE, alpha = 0.5, position = position_nudge(x = 0.2, y = 0), colour = NA) +
  geom_point(aes(x = as.numeric(time_factor)-0.25, y = avg_rew_Y, colour = time_factor), position = position_jitter(width = .05), size = .5, shape = 20) +
  geom_boxplot(outlier.shape = NA, alpha = 0.5, width = 0.25, position = position_dodge(width = 0.3), colour = "black") +
  geom_point(data = filtered_AvgRewY_summary, aes(x = factor(time_factor), y = score_mean), shape = 18, position = position_nudge(x = 0.2)) +
  geom_errorbar(data = filtered_AvgRewY_summary, aes(x = factor(time_factor), y = score_mean, ymin = score_mean - sem, ymax = score_mean + sem), width = 0.05, position = position_nudge(x = 0.2)) +
  labs(
    title = "Avg 'Yes' Reward Scores by Time Point", 
    y = "Score", 
    x = "Time Point", 
    fill = "Time Point",  # Legend title for fill
    color = "Time Point"  # Legend title for color
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"), # Make legend title bold
    axis.text.x = element_text(angle = 45, hjust = 1)  # Rotate x-axis labels by 45 degrees
  ) +
  ylim(-5, 105)

print(p7)

Code
# Create a avg 'no' reward raincloud plot
p8 <- ggplot(covid_long_final, aes(x = time_factor, y = avg_rew_N, fill = time_factor, color = time_factor)) +
  geom_flat_violin(adjust = 1.5, trim = FALSE, alpha = 0.5, position = position_nudge(x = 0.2, y = 0), colour = NA) +
  geom_point(aes(x = as.numeric(time_factor)-0.25, y = avg_rew_N, colour = time_factor), position = position_jitter(width = .05), size = .5, shape = 20) +
  geom_boxplot(outlier.shape = NA, alpha = 0.5, width = 0.25, position = position_dodge(width = 0.3), colour = "black") +
  geom_point(data = filtered_AvgRewN_summary, aes(x = factor(time_factor), y = score_mean), shape = 18, position = position_nudge(x = 0.2)) +
  geom_errorbar(data = filtered_AvgRewN_summary, aes(x = factor(time_factor), y = score_mean, ymin = score_mean - sem, ymax = score_mean + sem), width = 0.05, position = position_nudge(x = 0.2)) +
  labs(
    title = "Avg 'No' Reward Scores by Time Point", 
    y = "Score", 
    x = "Time Point", 
    fill = "Time Point",  # Legend title for fill
    color = "Time Point"  # Legend title for color
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"), # Make legend title bold
    axis.text.x = element_text(angle = 45, hjust = 1)  # Rotate x-axis labels by 45 degrees
  ) +
  ylim(-5, 105)

print(p8)

Code
# Combine raincloud plots and place the legend on the right
combined_plot78 <- p7 + p8 + plot_layout(guides = 'collect') &
  theme(legend.position = 'right')

# Print the combined plot
print(combined_plot78)

Code
### Repeating steps above but stratifying by both time points and epidemiological waves
# Create the summary avg risk 'Y' data frame
filtered_AvgRiskY_waves <- covid_long_final %>%
  group_by(time_factor, covid_group) %>%
  dplyr::summarise(
    score_mean = mean(avg_risk_Y, na.rm = TRUE),
    n = n(), # Sample size for each group
    sem = sd(avg_risk_Y, na.rm = TRUE) / sqrt(n()),
    .groups = 'drop'
  )

# Create the summary avg risk 'N' data frame
filtered_AvgRiskN_waves <- covid_long_final %>%
  group_by(time_factor, covid_group) %>%
  dplyr::summarise(
    score_mean = mean(avg_risk_N, na.rm = TRUE),
    n = n(), # Sample size for each group
    sem = sd(avg_risk_N, na.rm = TRUE) / sqrt(n()),
    .groups = 'drop'
  )

# Create the summary avg reward 'Y' data frame
filtered_AvgRewY_waves <- covid_long_final %>%
  group_by(time_factor, covid_group) %>%
  dplyr::summarise(
    score_mean = mean(avg_rew_Y, na.rm = TRUE),
    n = n(), # Sample size for each group
    sem = sd(avg_rew_Y, na.rm = TRUE) / sqrt(n()),
    .groups = 'drop'
  )

# Create the summary avg reward 'N' data frame
filtered_AvgRewN_waves <- covid_long_final %>%
  group_by(time_factor, covid_group) %>%
  dplyr::summarise(
    score_mean = mean(avg_rew_N, na.rm = TRUE),
    n = n(), # Sample size for each group
    sem = sd(avg_rew_N, na.rm = TRUE) / sqrt(n()),
    .groups = 'drop'
  )

# Create a avg 'yes' risk raincloud plot
ggplot(covid_long_final, aes(x = time_factor, y = avg_risk_Y, fill = covid_group, color = covid_group)) +
  geom_flat_violin(adjust = 1.5, trim = FALSE, alpha = 0.5, position = position_nudge(x = 0.2, y = 0), colour = NA) +
  geom_point(aes(x = as.numeric(time_factor)-0.25, y = avg_risk_Y, colour = covid_group), position = position_jitter(width = .05), size = .5, shape = 20) +
  geom_boxplot(outlier.shape = NA, alpha = 0.5, width = 0.25, position = position_dodge(width = 0.3), colour = "black") +
  geom_line(data = filtered_AvgRiskY_waves, aes(x = factor(time_factor), y = score_mean, group = covid_group, colour = covid_group), linetype = 2, position = position_nudge(x = 0.2)) +
  geom_point(data = filtered_AvgRiskY_waves, aes(x = factor(time_factor), y = score_mean), shape = 18, position = position_nudge(x = 0.2)) +
  geom_errorbar(data = filtered_AvgRiskY_waves, aes(x = factor(time_factor), y = score_mean, ymin = score_mean - sem, ymax = score_mean + sem), width = 0.05, position = position_nudge(x = 0.2)) +
  labs(
    title = "Avg 'Yes' Risk Scores by Time Point & Wave", 
    y = "Score", 
    x = "Time Point", 
    fill = "COVID Wave",  # Legend title for fill
    color = "COVID Wave"  # Legend title for color
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"), # Make legend title bold
    axis.text.x = element_text(angle = 45, hjust = 1)  # Rotate x-axis labels by 45 degrees
  ) +
  ylim(-5, 105)

Code
# Create a avg 'no' risk raincloud plot
ggplot(covid_long_final, aes(x = time_factor, y = avg_risk_N, fill = covid_group, color = covid_group)) +
  geom_flat_violin(adjust = 1.5, trim = FALSE, alpha = 0.5, position = position_nudge(x = 0.2, y = 0), colour = NA) +
  geom_point(aes(x = as.numeric(time_factor)-0.25, y = avg_risk_N, colour = covid_group), position = position_jitter(width = .05), size = .5, shape = 20) +
  geom_boxplot(outlier.shape = NA, alpha = 0.5, width = 0.25, position = position_dodge(width = 0.3), colour = "black") +
  geom_line(data = filtered_AvgRiskN_waves, aes(x = factor(time_factor), y = score_mean, group = covid_group, colour = covid_group), linetype = 2, position = position_nudge(x = 0.2)) +
  geom_point(data = filtered_AvgRiskN_waves, aes(x = factor(time_factor), y = score_mean), shape = 18, position = position_nudge(x = 0.2)) +
  geom_errorbar(data = filtered_AvgRiskN_waves, aes(x = factor(time_factor), y = score_mean, ymin = score_mean - sem, ymax = score_mean + sem), width = 0.05, position = position_nudge(x = 0.2)) +
  labs(
    title = "Avg 'No' Risk Scores by Time Point & Wave", 
    y = "Score", 
    x = "Time Point", 
    fill = "COVID Wave",  # Legend title for fill
    color = "COVID Wave"  # Legend title for color
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"), # Make legend title bold
    axis.text.x = element_text(angle = 45, hjust = 1)  # Rotate x-axis labels by 45 degrees
  ) +
  ylim(-5, 105)

Code
# Create a avg 'yes' reward raincloud plot
ggplot(covid_long_final, aes(x = time_factor, y = avg_rew_Y, fill = covid_group, color = covid_group)) +
  geom_flat_violin(adjust = 1.5, trim = FALSE, alpha = 0.5, position = position_nudge(x = 0.2, y = 0), colour = NA) +
  geom_point(aes(x = as.numeric(time_factor)-0.25, y = avg_rew_Y, colour = covid_group), position = position_jitter(width = .05), size = .5, shape = 20) +
  geom_boxplot(outlier.shape = NA, alpha = 0.5, width = 0.25, position = position_dodge(width = 0.3), colour = "black") +
  geom_line(data = filtered_AvgRewY_waves, aes(x = factor(time_factor), y = score_mean, group = covid_group, colour = covid_group), linetype = 2, position = position_nudge(x = 0.2)) +
  geom_point(data = filtered_AvgRewY_waves, aes(x = factor(time_factor), y = score_mean), shape = 18, position = position_nudge(x = 0.2)) +
  geom_errorbar(data = filtered_AvgRewY_waves, aes(x = factor(time_factor), y = score_mean, ymin = score_mean - sem, ymax = score_mean + sem), width = 0.05, position = position_nudge(x = 0.2)) +
  labs(
    title = "Avg 'Yes' Reward Scores by Time Point & Wave", 
    y = "Score", 
    x = "Time Point", 
    fill = "COVID Wave",  # Legend title for fill
    color = "COVID Wave"  # Legend title for color
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"), # Make legend title bold
    axis.text.x = element_text(angle = 45, hjust = 1)  # Rotate x-axis labels by 45 degrees
  ) +
  ylim(-5, 105)

Code
# Create a avg 'no' reward raincloud plot
ggplot(covid_long_final, aes(x = time_factor, y = avg_rew_N, fill = covid_group, color = covid_group)) +
  geom_flat_violin(adjust = 1.5, trim = FALSE, alpha = 0.5, position = position_nudge(x = 0.2, y = 0), colour = NA) +
  geom_point(aes(x = as.numeric(time_factor)-0.25, y = avg_rew_N, colour = covid_group), position = position_jitter(width = .05), size = .5, shape = 20) +
  geom_boxplot(outlier.shape = NA, alpha = 0.5, width = 0.25, position = position_dodge(width = 0.3), colour = "black") +
  geom_line(data = filtered_AvgRewN_waves, aes(x = factor(time_factor), y = score_mean, group = covid_group, colour = covid_group), linetype = 2, position = position_nudge(x = 0.2)) +
  geom_point(data = filtered_AvgRewN_waves, aes(x = factor(time_factor), y = score_mean), shape = 18, position = position_nudge(x = 0.2)) +
  geom_errorbar(data = filtered_AvgRewN_waves, aes(x = factor(time_factor), y = score_mean, ymin = score_mean - sem, ymax = score_mean + sem), width = 0.05, position = position_nudge(x = 0.2)) +
  labs(
    title = "Avg 'No' Reward Scores by Time Point & Wave", 
    y = "Score", 
    x = "Time Point", 
    fill = "COVID Wave",  # Legend title for fill
    color = "COVID Wave"  # Legend title for color
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"), # Make legend title bold
    axis.text.x = element_text(angle = 45, hjust = 1)  # Rotate x-axis labels by 45 degrees
  ) +
  ylim(-5, 105)

Code
# Count unique subjects in each category of 'covid_group'
covid_wave_counts <- covid_long_final %>%
  distinct(record_id, covid_group) %>%
  count(covid_group)

# Generate bar plot showing number of subjects in each wave
ggplot(covid_wave_counts, aes(x = covid_group, y = n, fill = covid_group)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = n), vjust = -0.3, size = 3.5) +  # Add labels above bars
  labs(title = "Count of Subjects per COVID Wave",
       x = "COVID Wave",
       y = "Number of Subjects") +
  theme_minimal() +
  theme(legend.position = "none")  # Remove legend if not needed

Code
# Extract month-year and count distinct subjects
monthly_baseline <- covid_long_final %>%
  distinct(record_id, baseline_date_complete) %>%
  mutate(month_year = format(baseline_date_complete, "%Y-%m")) %>%
  filter(baseline_date_complete >= as.Date("2020-05-01") & baseline_date_complete <= as.Date("2021-02-28")) %>%
  count(month_year)

# Plot the number of subjects completing baseline in each month-year
ggplot(monthly_baseline, aes(x = month_year, y = n, fill = month_year)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = n), vjust = -0.5, size = 3.5) +  # Adding count labels above the bars
  labs(title = "Monthly Count of Subjects Completing Baseline Session",
       x = "Month-Year",
       y = "Number of Subjects") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),  # Rotate x-axis labels for better readability
        legend.position = "none")  # Remove the legend

Code
# Loop over each unique variable in the 'variable' column
for (variable in unique(mentalhealth_survey_data_long$variable)) {
  # Filter the data for the current variable
  data_filtered <- mentalhealth_survey_data_long %>% 
    filter(variable == !!variable)

  # Assuming you have summary data for each variable stored in a way that can be accessed like this:
  summary_data <- get(paste0("filtered_", variable, "_summary"))

  # Ensure time_factor is ordered correctly
  summary_data$time_factor <- factor(summary_data$time_factor, levels = c(
    "Baseline", "Two Week Follow-Up", "One Month Follow-Up", 
    "Six Week Follow-Up", "Two Month Follow-Up", "Ten Week Follow-Up", 
    "Three Month Follow-Up"
  ))
  
  # Create the plot
  p <- ggplot(data_filtered, aes(x = time_factor, y = value, fill = time_factor, color = time_factor)) +
    geom_flat_violin(aes(fill = time_factor), adjust = 1.5, trim = FALSE, alpha = 0.5, position = position_nudge(x = 0.2, y = 0), colour = NA) +
    geom_point(aes(fill = time_factor, color = time_factor), position = position_jitter(width = .05), size = .5, shape = 20) +
    geom_boxplot(aes(fill = time_factor, colour = time_factor), outlier.shape = NA, alpha = 0.5, width = 0.15, position = position_nudge(x = -0.2, y = 0), colour = "black") +
    geom_point(data = summary_data, aes(x = factor(time_factor), y = score_mean, fill = time_factor, color = time_factor), shape = 18, position = position_nudge(x = 0.2)) +
    geom_errorbar(data = summary_data, aes(x = factor(time_factor), y = score_mean, ymin = score_mean - sem, ymax = score_mean + sem, fill = time_factor, colour = time_factor), width = 0.05, position = position_nudge(x = 0.2)) +
    geom_line(data = summary_data, aes(x = factor(time_factor), y = score_mean), group = 1, colour = "black", size = 0.5, linetype = "dashed", position = position_nudge(x = 0.2)) +  # modified line
    labs(
      title = paste(variable, "Scores by Time Point"), 
      y = "Score", 
      x = "Time Point", 
      fill = "Time Point",  # Legend title for fill
      color = "Time Point"  # Legend title for color
    ) +
    theme_minimal() +
    theme(
      legend.position = "right",
      legend.title = element_text(face = "bold"), # Make legend title bold
      axis.text.x = element_text(angle = 45, hjust = 1)  # Rotate x-axis labels by 45 degrees
    )

  # Print the plot
  print(p)
}

Behavioral Computations: Multilevel Modeling & Visualizations

Random intercept models are established to analyze various behavioral, risk, and reward metrics. Results are displayed in formatted tables that highlight both fixed and random effects across different behavior sums and averages.

Code
# Rename 'record_id' to 'id' in the dataframe 'covid_long_final'
covid_long_final <- covid_long_final %>%
  dplyr::rename(id = record_id)

install.packages("Matrix")

The downloaded binary packages are in
    /var/folders/c5/99nztygj2m5_63rhvx0ssc9c0000gn/T//RtmpvLz4ci/downloaded_packages
Code
library(Matrix)

# Random intercept models
model_random_intercept_num_beh_Y_sum <- lmer(yes_counts ~ time + (1|id), data = covid_long_final)
model_random_intercept_num_beh_N_sum <- lmer(no_counts ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_risk_Y <- lmer(avg_risk_Y ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_risk_N <- lmer(avg_risk_N ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_rew_Y <- lmer(avg_rew_Y ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_rew_N <- lmer(avg_rew_N ~ time + (1|id), data = covid_long_final)

model_random_intercept_avg_low_risk_Y <- lmer(avg_low_risk_Y ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_low_risk_N <- lmer(avg_low_risk_N ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_low_rew_Y <- lmer(avg_low_rew_Y ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_low_rew_N <- lmer(avg_low_rew_N ~ time + (1|id), data = covid_long_final)

model_random_intercept_avg_moderate_risk_Y <- lmer(avg_moderate_risk_Y ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_moderate_risk_N <- lmer(avg_moderate_risk_N ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_moderate_rew_Y <- lmer(avg_moderate_rew_Y ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_moderate_rew_N <- lmer(avg_moderate_rew_N ~ time + (1|id), data = covid_long_final)

model_random_intercept_avg_high_risk_Y <- lmer(avg_high_risk_Y ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_high_risk_N <- lmer(avg_high_risk_N ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_high_rew_Y <- lmer(avg_high_rew_Y ~ time + (1|id), data = covid_long_final)
model_random_intercept_avg_high_rew_N <- lmer(avg_high_rew_N ~ time + (1|id), data = covid_long_final)

# Creating a grouped table with formatted layout including all models
tab_model(
  title = "Random Intercept Model Results - Risk/Reward Behavior Sums & Averages",
  model_random_intercept_num_beh_Y_sum,
  model_random_intercept_num_beh_N_sum,
  model_random_intercept_avg_risk_Y,
  model_random_intercept_avg_risk_N,
  model_random_intercept_avg_rew_Y,
  model_random_intercept_avg_rew_N,
  model_random_intercept_avg_low_risk_Y,
  model_random_intercept_avg_low_risk_N,
  model_random_intercept_avg_low_rew_Y,
  model_random_intercept_avg_low_rew_N,
  model_random_intercept_avg_moderate_risk_Y,
  model_random_intercept_avg_moderate_risk_N,
  model_random_intercept_avg_moderate_rew_Y,
  model_random_intercept_avg_moderate_rew_N,
  model_random_intercept_avg_high_risk_Y,
  model_random_intercept_avg_high_risk_N,
  model_random_intercept_avg_high_rew_Y,
  model_random_intercept_avg_high_rew_N
)
Random Intercept Model Results - Risk/Reward Behavior Sums & Averages
  yes_counts no_counts avg_risk_Y avg_risk_N avg_rew_Y avg_rew_N avg_low_risk_Y avg_low_risk_N avg_low_rew_Y avg_low_rew_N avg_moderate_risk_Y avg_moderate_risk_N avg_moderate_rew_Y avg_moderate_rew_N avg_high_risk_Y avg_high_risk_N avg_high_rew_Y avg_high_rew_N
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 7.16 6.99 – 7.33 <0.001 16.26 15.74 – 16.77 <0.001 37.41 36.56 – 38.27 <0.001 60.30 59.26 – 61.35 <0.001 52.74 51.92 – 53.55 <0.001 51.75 50.80 – 52.70 <0.001 31.29 30.43 – 32.14 <0.001 31.84 30.73 – 32.94 <0.001 48.06 47.21 – 48.92 <0.001 50.57 49.37 – 51.77 <0.001 45.49 44.35 – 46.63 <0.001 58.50 57.46 – 59.53 <0.001 55.68 54.43 – 56.92 <0.001 53.22 52.26 – 54.19 <0.001 51.31 49.88 – 52.74 <0.001 72.59 71.50 – 73.69 <0.001 68.85 67.62 – 70.08 <0.001 51.54 50.44 – 52.65 <0.001
time -0.15 -0.18 – -0.13 <0.001 -0.82 -0.92 – -0.73 <0.001 -0.12 -0.25 – 0.01 0.079 -0.73 -0.90 – -0.56 <0.001 0.31 0.17 – 0.45 <0.001 0.62 0.44 – 0.80 <0.001 0.11 -0.02 – 0.25 0.093 0.44 0.21 – 0.68 <0.001 0.33 0.18 – 0.48 <0.001 0.35 0.07 – 0.64 0.016 -0.26 -0.48 – -0.04 0.019 -0.63 -0.79 – -0.47 <0.001 0.10 -0.18 – 0.37 0.480 0.53 0.33 – 0.73 <0.001 -0.59 -0.87 – -0.31 <0.001 -0.90 -1.07 – -0.74 <0.001 0.49 0.21 – 0.78 0.001 0.96 0.76 – 1.17 <0.001
Random Effects
σ2 4.55 47.58 90.85 125.59 104.29 148.03 94.11 185.35 124.34 289.88 188.35 94.86 307.94 145.41 205.18 100.25 222.37 166.53
τ00 9.06 id 81.45 id 258.74 id 351.83 id 216.35 id 252.16 id 252.92 id 293.94 id 229.86 id 281.97 id 355.16 id 334.19 id 350.21 id 233.63 id 441.34 id 398.22 id 255.18 id 347.90 id
ICC 0.67 0.63 0.74 0.74 0.67 0.63 0.73 0.61 0.65 0.49 0.65 0.78 0.53 0.62 0.68 0.80 0.53 0.68
N 1754 id 1754 id 1748 id 1655 id 1748 id 1655 id 1744 id 1513 id 1744 id 1513 id 1601 id 1537 id 1601 id 1537 id 1292 id 1582 id 1292 id 1582 id
Observations 6438 6438 6404 5385 6404 5385 6379 4343 6379 4343 4846 4744 4847 4744 3404 4994 3404 4994
Marginal R2 / Conditional R2 0.007 / 0.668 0.021 / 0.639 0.000 / 0.740 0.005 / 0.738 0.001 / 0.675 0.004 / 0.632 0.000 / 0.729 0.002 / 0.614 0.001 / 0.649 0.001 / 0.494 0.001 / 0.654 0.004 / 0.780 0.000 / 0.532 0.003 / 0.618 0.002 / 0.683 0.007 / 0.800 0.002 / 0.535 0.007 / 0.679

Linear and quadratic growth models are fitted to explore longitudinal trajectories and capture curvature in time trends of the dataset, incorporating both fixed effects of time and random effects of individual trajectories. The outcomes of these models are presented in structured tables.

Code
# Fit linear growth model with random intercepts and slopes and homoscedastic level-1 residuals
model_random_slope_num_beh_Y_sum <- lmer(yes_counts ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control=lmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
model_random_slope_num_beh_N_sum <- lmer(no_counts ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control=lmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
model_random_slope_avg_risk_Y <- lmer(avg_risk_Y ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_risk_N <- lmer(avg_risk_N ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_rew_Y <- lmer(avg_rew_Y ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_rew_N <- lmer(avg_rew_N ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))

model_random_slope_avg_low_risk_Y <- lmer(avg_low_risk_Y ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_low_risk_N <- lmer(avg_low_risk_N ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_low_rew_Y <- lmer(avg_low_rew_Y ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_low_rew_N <- lmer(avg_low_rew_N ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))

model_random_slope_avg_moderate_risk_Y <- lmer(avg_moderate_risk_Y ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_moderate_risk_N <- lmer(avg_moderate_risk_N ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_moderate_rew_Y <- lmer(avg_moderate_rew_Y ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_moderate_rew_N <- lmer(avg_moderate_rew_N ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))

model_random_slope_avg_high_risk_Y <- lmer(avg_high_risk_Y ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_high_risk_N <- lmer(avg_high_risk_N ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_high_rew_Y <- lmer(avg_high_rew_Y ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_random_slope_avg_high_rew_N <- lmer(avg_high_rew_N ~ time + (1 + time|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))

# Creating a grouped table with formatted layout including all models
tab_model(
  title = "Random Linear Growth Model Results - Risk/Reward Behavior Sums & Averages",
  model_random_slope_num_beh_Y_sum,
  model_random_slope_num_beh_N_sum,
  model_random_slope_avg_risk_Y,
  model_random_slope_avg_risk_N,
  model_random_slope_avg_rew_Y,
  model_random_slope_avg_rew_N,
  model_random_slope_avg_low_risk_Y,
  model_random_slope_avg_low_risk_N,
  model_random_slope_avg_low_rew_Y,
  model_random_slope_avg_low_rew_N,
  model_random_slope_avg_moderate_risk_Y,
  model_random_slope_avg_moderate_risk_N,
  model_random_slope_avg_moderate_rew_Y,
  model_random_slope_avg_moderate_rew_N,
  model_random_slope_avg_high_risk_Y,
  model_random_slope_avg_high_risk_N,
  model_random_slope_avg_high_rew_Y,
  model_random_slope_avg_high_rew_N
)
Random Linear Growth Model Results - Risk/Reward Behavior Sums & Averages
  yes_counts no_counts avg_risk_Y avg_risk_N avg_rew_Y avg_rew_N avg_low_risk_Y avg_low_risk_N avg_low_rew_Y avg_low_rew_N avg_moderate_risk_Y avg_moderate_risk_N avg_moderate_rew_Y avg_moderate_rew_N avg_high_risk_Y avg_high_risk_N avg_high_rew_Y avg_high_rew_N
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 7.17 7.00 – 7.35 <0.001 16.20 15.67 – 16.73 <0.001 37.36 36.51 – 38.21 <0.001 60.21 59.16 – 61.26 <0.001 52.78 51.97 – 53.59 <0.001 51.80 50.85 – 52.75 <0.001 31.26 30.42 – 32.10 <0.001 31.83 30.71 – 32.94 <0.001 48.07 47.22 – 48.92 <0.001 50.62 49.41 – 51.82 <0.001 45.39 44.27 – 46.52 <0.001 58.47 57.43 – 59.51 <0.001 55.75 54.49 – 57.00 <0.001 53.36 52.38 – 54.35 <0.001 51.25 49.78 – 52.72 <0.001 72.53 71.42 – 73.63 <0.001 68.86 67.62 – 70.10 <0.001 51.55 50.43 – 52.66 <0.001
time -0.17 -0.20 – -0.13 <0.001 -0.80 -0.92 – -0.69 <0.001 -0.09 -0.24 – 0.06 0.251 -0.70 -0.91 – -0.50 <0.001 0.29 0.13 – 0.45 <0.001 0.61 0.39 – 0.82 <0.001 0.13 -0.02 – 0.28 0.093 0.44 0.17 – 0.71 0.001 0.33 0.16 – 0.50 <0.001 0.35 0.03 – 0.68 0.034 -0.21 -0.46 – 0.03 0.089 -0.63 -0.82 – -0.43 <0.001 0.06 -0.25 – 0.38 0.701 0.46 0.22 – 0.70 <0.001 -0.55 -0.87 – -0.23 0.001 -0.88 -1.08 – -0.67 <0.001 0.49 0.19 – 0.78 0.001 0.98 0.74 – 1.22 <0.001
Random Effects
σ2 4.10 40.45 83.43 113.93 96.94 134.52 88.36 167.91 116.50 265.00 177.20 84.88 281.48 125.81 184.16 85.52 216.45 150.98
τ00 10.11 id 89.45 id 255.04 id 356.81 id 216.39 id 254.36 id 243.52 id 307.03 id 228.07 id 294.51 id 346.04 id 337.52 id 369.84 id 251.69 id 481.98 id 405.42 id 262.27 id 354.75 id
τ11 0.12 id.time 1.74 id.time 1.80 id.time 2.91 id.time 1.80 id.time 3.41 id.time 1.40 id.time 4.39 id.time 1.91 id.time 6.20 id.time 2.64 id.time 2.53 id.time 6.31 id.time 5.01 id.time 5.12 id.time 3.82 id.time 1.37 id.time 3.90 id.time
ρ01 -0.37 id -0.31 id -0.06 id -0.13 id -0.11 id -0.15 id 0.03 id -0.22 id -0.09 id -0.24 id -0.02 id -0.12 id -0.24 id -0.28 id -0.31 id -0.15 id -0.17 id -0.16 id
ICC 0.69 0.69 0.76 0.76 0.70 0.67 0.75 0.65 0.67 0.54 0.68 0.80 0.57 0.67 0.71 0.83 0.55 0.71
N 1754 id 1754 id 1748 id 1655 id 1748 id 1655 id 1744 id 1513 id 1744 id 1513 id 1601 id 1537 id 1601 id 1537 id 1292 id 1582 id 1292 id 1582 id
Observations 6438 6438 6404 5385 6404 5385 6379 4343 6379 4343 4846 4744 4847 4744 3404 4994 3404 4994
Marginal R2 / Conditional R2 0.009 / 0.697 0.021 / 0.693 0.000 / 0.762 0.004 / 0.763 0.001 / 0.699 0.004 / 0.666 0.000 / 0.747 0.002 / 0.650 0.001 / 0.673 0.001 / 0.537 0.000 / 0.676 0.004 / 0.804 0.000 / 0.574 0.002 / 0.670 0.002 / 0.714 0.006 / 0.831 0.002 / 0.547 0.008 / 0.710
Code
# Quadratic model with random intercept and random slope
model_quad_num_beh_Y_sum <- lmer(yes_counts ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun= 2e5)))
model_quad_num_beh_N_sum <- lmer(no_counts ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun= 2e5)))
model_quad_avg_risk_Y <- lmer(avg_risk_Y ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5)))
model_quad_avg_risk_N <- lmer(avg_risk_N ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5)))
model_quad_avg_rew_Y <- lmer(avg_rew_Y ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5)))
model_quad_avg_rew_N <- lmer(avg_rew_N ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5)))

model_quad_avg_low_risk_Y <- lmer(avg_low_risk_Y ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
#model_quad_avg_low_risk_N <- lmer(avg_low_risk_N ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_quad_avg_low_rew_Y <- lmer(avg_low_rew_Y ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
#model_quad_avg_low_rew_N <- lmer(avg_low_rew_N ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))

model_quad_avg_moderate_risk_Y <- lmer(avg_moderate_risk_Y ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_quad_avg_moderate_risk_N <- lmer(avg_moderate_risk_N ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_quad_avg_moderate_rew_Y <- lmer(avg_moderate_rew_Y ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_quad_avg_moderate_rew_N <- lmer(avg_moderate_rew_N ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))

#model_quad_avg_high_risk_Y <- lmer(avg_high_risk_Y ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_quad_avg_high_risk_N <- lmer(avg_high_risk_N ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
#model_quad_avg_high_rew_Y <- lmer(avg_high_rew_Y ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))
model_quad_avg_high_rew_N <- lmer(avg_high_rew_N ~ time + I(time^2) + (1 + time + I(time^2)|id), data = covid_long_final, REML = FALSE, control = lmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 2e5)))

# Creating a grouped table with formatted layout including all models
tab_model(
  title = "Random Quadratic Growth Model Results - Risk/Reward Behavior Sums & Averages",
  model_quad_num_beh_Y_sum,
  model_quad_num_beh_N_sum,
  model_quad_avg_risk_Y,
  model_quad_avg_risk_N,
  model_quad_avg_rew_Y,
  model_quad_avg_rew_N,
  model_quad_avg_low_risk_Y,
  #model_quad_avg_low_risk_N,
  model_quad_avg_low_rew_Y,
  #model_quad_avg_low_rew_N,
  model_quad_avg_moderate_risk_Y,
  model_quad_avg_moderate_risk_N,
  model_quad_avg_moderate_rew_Y,
  model_quad_avg_moderate_rew_N,
  #model_quad_avg_high_risk_Y,
  model_quad_avg_high_risk_N,
  #model_quad_avg_high_rew_Y,
  model_quad_avg_high_rew_N
)
Random Quadratic Growth Model Results - Risk/Reward Behavior Sums & Averages
  yes_counts no_counts avg_risk_Y avg_risk_N avg_rew_Y avg_rew_N avg_low_risk_Y avg_low_rew_Y avg_moderate_risk_Y avg_moderate_risk_N avg_moderate_rew_Y avg_moderate_rew_N avg_high_risk_N avg_high_rew_N
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 7.43 7.24 – 7.61 <0.001 16.38 15.82 – 16.94 <0.001 36.96 36.09 – 37.83 <0.001 59.99 58.91 – 61.08 <0.001 52.92 52.08 – 53.77 <0.001 51.54 50.54 – 52.55 <0.001 30.71 29.85 – 31.58 <0.001 47.95 47.07 – 48.84 <0.001 44.64 43.47 – 45.82 <0.001 58.09 57.01 – 59.16 <0.001 56.05 54.74 – 57.36 <0.001 53.28 52.25 – 54.31 <0.001 72.22 71.08 – 73.36 <0.001 51.09 49.91 – 52.27 <0.001
time -0.63 -0.73 – -0.53 <0.001 -1.11 -1.45 – -0.76 <0.001 0.61 0.16 – 1.07 0.008 -0.34 -0.93 – 0.24 0.249 0.04 -0.43 – 0.51 0.877 1.06 0.44 – 1.68 0.001 1.08 0.61 – 1.55 <0.001 0.53 0.01 – 1.04 0.044 1.20 0.46 – 1.94 0.002 0.04 -0.51 – 0.59 0.877 -0.48 -1.40 – 0.44 0.308 0.60 -0.04 – 1.25 0.066 -0.30 -0.85 – 0.26 0.292 1.82 1.10 – 2.55 <0.001
time^2 0.08 0.07 – 0.10 <0.001 0.06 0.00 – 0.11 0.037 -0.13 -0.20 – -0.05 0.001 -0.06 -0.16 – 0.03 0.193 0.04 -0.03 – 0.12 0.244 -0.08 -0.18 – 0.02 0.110 -0.17 -0.24 – -0.09 <0.001 -0.04 -0.12 – 0.05 0.399 -0.25 -0.38 – -0.13 <0.001 -0.12 -0.21 – -0.03 0.008 0.10 -0.05 – 0.25 0.204 -0.03 -0.13 – 0.08 0.627 -0.10 -0.19 – -0.01 0.023 -0.15 -0.26 – -0.04 0.008
Random Effects
σ2 3.74 36.43 79.27 109.32 95.51 130.07 82.59 114.69 169.59 80.90 274.37 122.48 81.12 140.86
τ00 11.04 id 95.27 id 250.56 id 359.04 id 212.49 id 259.04 id 241.21 id 222.00 id 344.29 id 338.30 id 345.65 id 248.71 id 411.22 id 369.69 id
τ11 0.98 id.time 14.63 id.time 12.46 id.time 16.02 id.time 7.63 id.time 18.30 id.time 15.59 id.time 9.11 id.time 20.37 id.time 13.89 id.time 23.33 id.time 14.83 id.time 18.44 id.time 37.25 id.time
0.02 id.I(time^2) 0.21 id.I(time^2) 0.21 id.I(time^2) 0.23 id.I(time^2) 0.06 id.I(time^2) 0.16 id.I(time^2) 0.29 id.I(time^2) 0.08 id.I(time^2) 0.34 id.I(time^2) 0.20 id.I(time^2) 0.33 id.I(time^2) 0.15 id.I(time^2) 0.25 id.I(time^2) 0.48 id.I(time^2)
ρ01 -0.47 -0.32 0.02 -0.12 -0.02 -0.17 -0.00 0.02 -0.01 -0.11 0.08 -0.15 -0.18 -0.24
0.43 0.21 -0.10 0.04 -0.15 0.09 -0.04 -0.21 -0.06 0.03 -0.40 -0.09 0.12 0.20
ICC 0.72 0.72 0.77 0.77     0.76   0.69 0.81 0.58 0.68 0.84  
N 1754 id 1754 id 1748 id 1655 id 1748 id 1655 id 1744 id 1744 id 1601 id 1537 id 1601 id 1537 id 1582 id 1582 id
Observations 6438 6438 6404 5385 6404 5385 6379 6379 4846 4744 4847 4744 4994 4994
Marginal R2 / Conditional R2 0.015 / 0.721 0.020 / 0.723 0.001 / 0.775 0.004 / 0.773 0.004 / NA 0.012 / NA 0.001 / 0.764 0.004 / NA 0.002 / 0.691 0.004 / 0.813 0.000 / 0.585 0.002 / 0.679 0.006 / 0.838 0.029 / NA

Finally, visualizations of the predicted trajectories from these models are generated, using polynomial regression lines to depict dynamic changes and individual variability over time.

Code
# Plot model implied trajectories, quadratic
ggplot(covid_long_final,aes(time,predict(model_quad_num_beh_Y_sum))) +
  ylim(0, 40) +
  geom_smooth(aes(group=id),method='lm',formula=y~x+I(x^2),se=FALSE,
              color="black",size=.1) +
  labs(x = "Time Point") +
  labs(y = "Predicted Sum Number of 'Yes' Behaviors")

Code
ggplot(covid_long_final,aes(time,predict(model_quad_num_beh_N_sum))) +
  ylim(0, 40) +
  geom_smooth(aes(group=id),method='lm',formula=y~x+I(x^2),se=FALSE,
              color="black",size=.1) +
  labs(x = "Time Point") +
  labs(y = "Predicted Sum Number of 'No' Behaviors")

Code
# Plot model implied trajectories, linear
ggplot(covid_long_final, aes(x = time, y = predict(model_random_slope_num_beh_Y_sum))) +
  geom_smooth(aes(group = id), method = 'lm', formula = y ~ x, se = FALSE, color = "black", size = .1) +
  labs(x = "Time Point", y = "Predicted Sum Number of 'Yes' Behaviors") +
  ylim(0, 40)

Code
ggplot(covid_long_final, aes(x = time, y = predict(model_random_slope_num_beh_N_sum))) +
  geom_smooth(aes(group = id), method = 'lm', formula = y ~ x, se = FALSE, color = "black", size = .1) +
  labs(x = "Time Point", y = "Predicted Sum Number of 'No' Behaviors") +
  ylim(0, 40)

Code
### Imputation code--needed as the predicted values has fewer rows than the original data frame
# Calculate predictions for each model
quad_predicted_avg_risk_Y <- predict(model_quad_avg_risk_Y, re.form = NA)
quad_predicted_avg_risk_N <- predict(model_quad_avg_risk_N, re.form = NA)
quad_predicted_avg_rew_Y <- predict(model_quad_avg_rew_Y, re.form = NA)
quad_predicted_avg_rew_N <- predict(model_quad_avg_rew_N, re.form = NA)
lin_predicted_avg_risk_Y <- predict(model_random_slope_avg_risk_Y, re.form = NA)
lin_predicted_avg_risk_N <- predict(model_random_slope_avg_risk_N, re.form = NA)
lin_predicted_avg_rew_Y <- predict(model_random_slope_avg_rew_Y, re.form = NA)
lin_predicted_avg_rew_N <- predict(model_random_slope_avg_rew_N, re.form = NA)

# Add predicted values to the dataframe, initializing with NA
covid_long_final$quad_predicted_avg_risk_Y <- NA
covid_long_final$quad_predicted_avg_risk_N <- NA
covid_long_final$quad_predicted_avg_rew_Y <- NA
covid_long_final$quad_predicted_avg_rew_N <- NA
covid_long_final$lin_predicted_avg_risk_Y <- NA
covid_long_final$lin_predicted_avg_risk_N <- NA
covid_long_final$lin_predicted_avg_rew_Y <- NA
covid_long_final$lin_predicted_avg_rew_N <- NA

# Fill in the predicted values where available
covid_long_final$quad_predicted_avg_risk_Y[!is.na(quad_predicted_avg_risk_Y)] <- quad_predicted_avg_risk_Y
covid_long_final$quad_predicted_avg_risk_N[!is.na(quad_predicted_avg_risk_N)] <- quad_predicted_avg_risk_N
covid_long_final$quad_predicted_avg_rew_Y[!is.na(quad_predicted_avg_rew_Y)] <- quad_predicted_avg_rew_Y
covid_long_final$quad_predicted_avg_rew_N[!is.na(quad_predicted_avg_rew_N)] <- quad_predicted_avg_rew_N
covid_long_final$lin_predicted_avg_risk_Y[!is.na(lin_predicted_avg_risk_Y)] <- lin_predicted_avg_risk_Y
covid_long_final$lin_predicted_avg_risk_N[!is.na(lin_predicted_avg_risk_N)] <- lin_predicted_avg_risk_N
covid_long_final$lin_predicted_avg_rew_Y[!is.na(lin_predicted_avg_rew_Y)] <- lin_predicted_avg_rew_Y
covid_long_final$lin_predicted_avg_rew_N[!is.na(lin_predicted_avg_rew_N)] <- lin_predicted_avg_rew_N

# Handle NA predicted values by assigning the mean of the available predictions
quad_mean_predicted_risk_Y <- mean(quad_predicted_avg_risk_Y, na.rm = TRUE)
quad_mean_predicted_risk_N <- mean(quad_predicted_avg_risk_N, na.rm = TRUE)
quad_mean_predicted_rew_Y <- mean(quad_predicted_avg_rew_Y, na.rm = TRUE)
quad_mean_predicted_rew_N <- mean(quad_predicted_avg_rew_N, na.rm = TRUE)
lin_mean_predicted_risk_Y <- mean(lin_predicted_avg_risk_Y, na.rm = TRUE)
lin_mean_predicted_risk_N <- mean(lin_predicted_avg_risk_N, na.rm = TRUE)
lin_mean_predicted_rew_Y <- mean(lin_predicted_avg_rew_Y, na.rm = TRUE)
lin_mean_predicted_rew_N <- mean(lin_predicted_avg_rew_N, na.rm = TRUE)

covid_long_final$quad_predicted_avg_risk_Y[is.na(covid_long_final$quad_predicted_avg_risk_Y)] <- quad_mean_predicted_risk_Y
covid_long_final$quad_predicted_avg_risk_N[is.na(covid_long_final$quad_predicted_avg_risk_N)] <- quad_mean_predicted_risk_N
covid_long_final$quad_predicted_avg_rew_Y[is.na(covid_long_final$quad_predicted_avg_rew_Y)] <- quad_mean_predicted_rew_Y
covid_long_final$quad_predicted_avg_rew_N[is.na(covid_long_final$quad_predicted_avg_rew_N)] <- quad_mean_predicted_rew_N
covid_long_final$lin_predicted_avg_risk_Y[is.na(covid_long_final$lin_predicted_avg_risk_Y)] <- lin_mean_predicted_risk_Y
covid_long_final$lin_predicted_avg_risk_N[is.na(covid_long_final$lin_predicted_avg_risk_N)] <- lin_mean_predicted_risk_N
covid_long_final$lin_predicted_avg_rew_Y[is.na(covid_long_final$lin_predicted_avg_rew_Y)] <- lin_mean_predicted_rew_Y
covid_long_final$lin_predicted_avg_rew_N[is.na(covid_long_final$lin_predicted_avg_rew_N)] <- lin_mean_predicted_rew_N

# Plotting all four graphs, quadratic
q1 <- ggplot(covid_long_final, aes(x = time, y = quad_predicted_avg_risk_Y)) +
  geom_smooth(aes(group = id), method = 'lm', formula = y ~ x + I(x^2), se = FALSE, color = "black", size = .1) +
  labs(x = "Time Point", y = "Predicted Average 'Yes' Risk Score")

q2 <- ggplot(covid_long_final, aes(x = time, y = quad_predicted_avg_risk_N)) +
  geom_smooth(aes(group = id), method = 'lm', formula = y ~ x + I(x^2), se = FALSE, color = "black", size = .1) +
  labs(x = "Time Point", y = "Predicted Average 'No' Risk Score")

q3 <- ggplot(covid_long_final, aes(x = time, y = quad_predicted_avg_rew_Y)) +
  geom_smooth(aes(group = id), method = 'lm', formula = y ~ x + I(x^2), se = FALSE, color = "black", size = .1) +
  labs(x = "Time Point", y = "Predicted Average 'Yes' Reward Score")

q4 <- ggplot(covid_long_final, aes(x = time, y = quad_predicted_avg_rew_N)) +
  geom_smooth(aes(group = id), method = 'lm', formula = y ~ x + I(x^2), se = FALSE, color = "black", size = .1) +
  labs(x = "Time Point", y = "Predicted Average 'No' Reward Score")

# Combine and display the plots
grid.arrange(q1, q2, ncol = 2)

Code
grid.arrange(q3, q4, ncol = 2)

Code
# Plotting all four graphs, linear
l1 <- ggplot(covid_long_final, aes(x = time, y = lin_predicted_avg_risk_Y)) +
  geom_smooth(aes(group = id), method = 'lm', formula = y ~ x, se = FALSE, color = "black", size = .1) +
  labs(x = "Time Point", y = "Predicted Average 'Yes' Risk Score")

l2 <- ggplot(covid_long_final, aes(x = time, y = lin_predicted_avg_risk_N)) +
  geom_smooth(aes(group = id), method = 'lm', formula = y ~ x, se = FALSE, color = "black", size = .1) +
  labs(x = "Time Point", y = "Predicted Average 'No' Risk Score")

l3 <- ggplot(covid_long_final, aes(x = time, y = lin_predicted_avg_rew_Y)) +
  geom_smooth(aes(group = id), method = 'lm', formula = y ~ x, se = FALSE, color = "black", size = .1) +
  labs(x = "Time Point", y = "Predicted Average 'Yes' Reward Score")

l4 <- ggplot(covid_long_final, aes(x = time, y = lin_predicted_avg_rew_N)) +
  geom_smooth(aes(group = id), method = 'lm', formula = y ~ x, se = FALSE, color = "black", size = .1) +
  labs(x = "Time Point", y = "Predicted Average 'No' Reward Score")

# Combine and display the plots
grid.arrange(l1, l2, ncol = 2)

Code
grid.arrange(l3, l4, ncol = 2)

Session Information

To enhance reproducibility, details about the working environment used for these analyses can be found below.

Code
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] patchwork_1.3.0     PupillometryR_0.0.5 rlang_1.1.4        
 [4] ggridges_0.5.6      sjPlot_2.8.16       gridExtra_2.3      
 [7] broom.mixed_0.2.9.5 kableExtra_1.4.0    lubridate_1.9.3    
[10] forcats_1.0.0       stringr_1.5.1       purrr_1.0.2        
[13] readr_2.1.5         tibble_3.2.1        tidyverse_2.0.0    
[16] tidyr_1.3.1         sampling_2.10       interactions_1.2.0 
[19] performance_0.12.3  dplyr_1.1.4         nlme_3.1-164       
[22] lmerTest_3.1-3      lme4_1.1-35.5       Matrix_1.7-0       
[25] ggplot2_3.5.1       psych_2.4.6.26      pacman_0.5.1       

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1    sjlabelled_1.2.0    viridisLite_0.4.2  
 [4] farver_2.1.2        fastmap_1.2.0       bayestestR_0.14.0  
 [7] jtools_2.3.0        sjstats_0.19.0      digest_0.6.37      
[10] timechange_0.3.0    lifecycle_1.0.4     magrittr_2.0.3     
[13] compiler_4.4.1      tools_4.4.1         utf8_1.2.4         
[16] yaml_2.3.10         knitr_1.48          labeling_0.4.3     
[19] mnormt_2.1.1        xml2_1.3.6          withr_3.0.1        
[22] numDeriv_2016.8-1.1 grid_4.4.1          datawizard_0.12.3  
[25] fansi_1.0.6         colorspace_2.1-1    future_1.34.0      
[28] globals_0.16.3      scales_1.3.0        MASS_7.3-60.2      
[31] insight_0.20.5      cli_3.6.3           rmarkdown_2.28     
[34] generics_0.1.3      rstudioapi_0.16.0   tzdb_0.4.0         
[37] parameters_0.22.2   minqa_1.2.8         pander_0.6.5       
[40] splines_4.4.1       parallel_4.4.1      effectsize_0.8.9   
[43] vctrs_0.6.5         boot_1.3-30         jsonlite_1.8.9     
[46] hms_1.1.3           listenv_0.9.1       systemfonts_1.1.0  
[49] glue_1.8.0          parallelly_1.38.0   nloptr_2.1.1       
[52] codetools_0.2-20    stringi_1.8.4       gtable_0.3.5       
[55] ggeffects_1.7.1     munsell_0.5.1       furrr_0.3.1        
[58] pillar_1.9.0        htmltools_0.5.8.1   R6_2.5.1           
[61] lpSolve_5.6.21      evaluate_1.0.0      lattice_0.22-6     
[64] backports_1.5.0     broom_1.0.7         Rcpp_1.0.13        
[67] svglite_2.1.3       mgcv_1.9-1          xfun_0.48          
[70] sjmisc_2.8.10       pkgconfig_2.0.3